SlideShare a Scribd company logo
Hobby Horses and Detail Devils
Transparency in Digital Humanities Research and
Infrastructure
Marijn Koolen
Team R&D - Digital Infrastructure - KNAW Humanities Cluster
HuC Lecture - 28 May 2019 - IISH, Amsterdam
Slide URL: http://bit.ly/HuC-2019-Hobby-Horses
Hobby Horses
Overview
Digital Tool Criticism - reflection on tool use
Data Scopes - conceptual framework for data transformations
Documenting Research Process - Tracing choices and decisions
Digital Tool Criticism
Critical reflection on tool use
Tools and Assumptions
● Input data needs to be interpreted to be usable by a digital tool
○ Algorithms rely on many assumptions and expectations
○ F.i. textual tools often assume modern English text
■ Regardless of the actual language of your data
○ If you use it, you (implicitly) assume that, for your research purpose, Dutch has the same
characteristics as English
● Which assumptions should we be aware of?
Developing Methods for Digital Tool Criticism
● Workshops
○ Tool Crit. 2015,
○ DH Benelux 2017 & 2018
○ Research master Media, Art and Performance Studies - Utrecht University
● Experiments in using tools to
○ Develop research questions
○ Analyse online datasets
○ Reproduce published research
● Work in small groups
○ Keep a collaborative research journal
○ Reflect on process through journal
Visualizing Research Journeys
We analysed the research journals of the participant groups
Colour-coded the notes based on 5 aspects:
Research question
Method
Tool
Dataset
Reflection (hard to read: it’s yellow and says “Reflection”)
Coding Collaborative Journals
Coding Collaborative Journals
Research DNA Visualizations
Koolen, van Gorp, van Ossenbruggen 2018
● Participants liked Collaboration and Experimentation
● Collaboratively using tools prompts discussions
○ Face-to-face: collaboratively looking under the hood and its consequences
○ Explaining how you think it works is a great way to bring out gaps in your own understanding
(Sloman and Fernbach 2017)
● Many research questions require huge number of skills
○ Need to collaborate to ensure at least someone involved understands specific tool details
● Experimentation with tools to deepen understanding
○ Compare intermediate output with input -> What has changed? What has disappeared?
○ Try different settings and compare intermediate output -> What is different?
Findings on the Workshop Format
● Effective elements of the workshop format
○ Answer series of questions on tool and data:
■ Who made them, when, why, what for, with what assumptions?
■ Similar to source criticism
○ Focus on integrative reflection
■ Need to critically reflect on tools in combination with other elements of research design
● Research questions
● Methods
● Digital tools
● Digital data
Lessons on Tool Criticism
Model: Reflection as Integrative Practice
An interactive model of digital tool criticism, where reflection integrates the four concepts of research
questions, methods, data and tools as interactive and interdependent parts of the research process
(Koolen, van Gorp & van Ossenbruggen 2019)
Entanglement of Data and Tools
Entanglement of Data and Tools
Each step changes the underlying data!
Lessons on Tool Criticism
● To what extent should we understand tool details?
○ At the level of data transformations (echoing Ben Schmidt 2016)
○ And how does that change our interpretation?
● To what extent can we develop tools and interfaces that support this?
○ Prioritize documentation
○ Build in elements that encourage reflection
Search Engines: Experts in Retrieval, Masters in Hiding
● Inspection tool in CLARIAH Media Suite is first attempt
○ What are other ways to identify and flag issues in data and tools?
○ Input from researchers and developers needed!
● How do/can other often used search interfaces deal with transparency?
○ Nederlab
○ Delpher
○ WorldCat
○ Pica
○ Google
Reflection and Transparency in Tool Interfaces
● List of recommendations (Koolen, van Gorp, van Ossenbruggen - DSH 2018)
● For researchers:
○ Incorporate digital source, data and tool criticism in research process
○ Explicitly ask and answer questions about assumptions, choices, limitations
■ Document and share workarounds
○ Develop method of experimentation with tool to test functioning
○ Document research process
● For tool developers and data providers (and researchers sharing datasets):
○ Add an “About” page and documentation on functionalities
○ Design UIs so as to encourage reflection!
○ Describe selection criteria and transformations of data sets
Recommendations
Taking Up Recommendations
Framework for thinking about data transformations
Support for documenting research process
Data Scopes
Conceptual framework for data transformations
● Data needs processing to offer insights for research questions
○ Making data transformations offers different perspectives or scopes on data
■ Often left out of publications, outsourced as “technical detail”
■ But details matter, process is intellectual effort!
● Data Scopes concept (Hoekstra and Koolen 2018)
○ Framework for thinking and communicating about research data processing
○ Especially for combining data from different sources
Data Scopes
Data Scopes
● Data needs processing to offer insights for research questions
○ Making data transformations offers different perspectives or scopes on data
■ Often left out of publications, outsourced as “technical detail”
■ But details matter, process is intellectual effort!
● Data Scopes concept (Hoekstra and Koolen 2018)
○ Framework for thinking and communicating about research data processing
○ Especially for combining data from different sources
● Five types of transforming activities:
○ Selecting
○ Modeling
○ Normalizing
○ Linking
○ Classifying
● A form of scholarly primitives (Unsworth 2000, Anderson et al. 2010)
● Online book response corpus (Boot 2017)
○ ~400,000 book reviews in Dutch
○ From different review sites (Bol, Hebban, Dizzie, Wat Lees Jij Nu, …)
● Research questions
○ What impact does reading fiction have on readers?
■ How do reviewers describe impact of book?
■ Are there differences across genres/authors?
Use Case 1: Analyzing Online Book Reviews
● Pay ledgers of VOC personnel
○ 774,200 contracts between VOC and individual persons
○ Career transitions: two subsequent contracts of the same person
● Research questions
○ What are typical career paths for VOC personnel?
○ Do migrants have different careers or chances of promotion than non-migrants?
Use Case 2: VOC Maritime Careers
Reading Impact
“Je gaat Stijn eigenlijk een beetje begrijpen, …”
“You start to understand Stijn, …”
“Helaas is de schrijfstijl bedroevend, presenteert Kluun zich als een
pseudo-intellectueel …”
“Unfortunately the writing style is pathetic, Kluun presents himself as a pseudo-intellectual …”
Reading Impact Rules
● 349 rules
○ Identifying 4 types of impact: general, narrative, style, reflection
● Term: boeiend
○ Rule 2: Style impact: boeiend + style term (“boeiend taalgebruik”)
○ Rule 3: Reflection: boeiend + topic term (“boeiende thematiek”)
● Phrase: in één ruk * uitlezen
○ Rule 79: General impact (“Ik heb het boek in één ruk helemaal uitgelezen.”)
○ Many variants
■ één/een/1
■ adem/avond/dag/keer/middag/ruk/stuk/zucht/...
■ uitlezen/uit
● Gather review data
○ Select review sites
○ Model review from web page (book title, author, ISBN, reviewer, date, rating, website, …)
○ Link author and book title to WorldCat record (for missing data)
■ Select ISBN, publisher, publication year
○ Link ISBN to record in boek.nl database
■ Select genre classification (NUR code)
Data Scopes for Reading Impact Analysis (1/2)
Data Scopes for Reading Impact Analysis (2/2)
● Extract impact expressions
○ Select individual sentences from book reviews
○ Normalise words in sentences to their lemmas
○ Select all sentences that match an impact rule
○ Classify sentences by impact rule
● Analyse impact
○ Select impact matches by book genre or author or book ID or reviewer ID
● Extract domain specific sentiment lexicon
○ Compare positive and negative reviews
● Intended selection
○ Positive: 4+5 star reviews
○ Negative: 1+2 star reviews
Intended and Unintended Selection
● Extract domain specific sentiment lexicon
○ Compare positive and negative reviews
● Intended selection
○ Positive: 4+5 star reviews
○ Negative: 1+2 star reviews
● Unintended selection
○ Bol.com and Hebban.nl: all reviews have a star rating
○ Leestafel.info: none of the reviews have a rating
○ Selections exclude all Leestafel.info reviews
○ Consequence: Leestafel.info reviews not represented in sentiment lexicon!
Intended and Unintended Selection
Intended and Unintended Selection
● Extract domain specific sentiment lexicon
○ Compare positive and negative reviews
● Intended selection
○ Positive: 4+5 star reviews
○ Negative: 1+2 star reviews
● Unintended selection
○ Bol.com and Hebban.nl: all reviews have a star rating
○ Leestafel.info: none of the reviews have a rating
○ Selections exclude all Leestafel.info reviews
○ Consequence: Leestafel.info reviews not represented in sentiment lexicon!
● Selection choices lead to side effects!
Modelling and Consequences
● Review text
○ Extract text only, ignore images and emoji’s
● Book identifier
○ To group reviews of same book
○ ISBN? Not always present, each version has own ISBN
● Dates:
○ Underspecified dates cause undefined behaviour when sorting by date
● Ratings:
○ some review sites have 5-star rating system, some allow half stars
○ mixing different rating systems results in odd distributions
■ How do you compare 4-star reviews to 5-star reviews?
Linking
● Add missing data via external sources
○ Missing ISBN, publisher and publication date info
● Add contextual data for interpretation
○ Genre/subject classification
Classification
● Reduce complexity by grouping on common characteristics
● Boundaries are often arbitrary
○ NUR code: Genre/subject classification (based on Dewey Decimal Classification)
■ 302 - Translated literary novel
■ 305 - Literary Thriller
■ 331 - Detective
■ 342 - Historical novel
○ Different editions of same book can get different classifications!
Maritime Career Data Scopes
● Selection: VOC pay ledgers between 1680 and 1794
○ Some ledgers are missing, before 1680 most are missing
● Modeling: two pay ledgers mention same person if
○ Entries have similar full names
○ Same place of origin
○ Gap between subsequent contracts is less than 6 years
○ The person didn’t die during the first contract period
● Normalizing: modernize and standardize spelling of names and ranks
● Classification: assign entry to person ID, rank to Occupation classification
● Linking: place of origin to Geo database
○ Is place of origin within or outside the Dutch Republic?
○ Because we’re interested in migrant vs. non-migrant
Normalizing Person and Place Names
Career Transitions - Ship Gunner
Career Transitions - Ship Gunner
Career Transitions - First Corporal
Career Transitions - Intermediate Results
● Many phenomena in data have skewed distribution
○ Few high frequent, many low frequent: long tail
○ Descriptive stats like mean/average are not very useful
● Appear everywhere
○ Maritime data: 197 distinct ranks, top 2 (or 1%) cover ~50% of the data
○ Book reviews: >33,000 authors, top 330 (1%) cover ~37% of the data
Data Distributions
Data Distributions - Long Tails and Analysis
● Easy to focus on head of distribution: small set of most frequent
○ But they are not representative, as the vast majority is different
○ But long tail has too many to analyze in detail
● Use classification to group low frequent items
○ Group ranks by type and level: naval/military/craftsmen, first/second/third
○ Group book/authors by genre
○ But usually same problem reappears: few large groups, many small groups
○ Variance within large groups is bigger than between groups
Wrap Data Scopes
● This process is not “mere preparation”
○ but part of the “real research”
● Process is complex, takes intellectual effort
○ Requires both technical and domain knowledge and interpretation
○ Different choices can lead to very different analyses and interpretations
○ Break down complexity by engaging with intermediate results
○ Tools should be transparent about transformations, show intermediate results
Wrap Data Scopes
● This process is not “mere preparation”
○ but part of the “real research”
● Process is complex, takes intellectual effort
○ Requires both technical and domain knowledge and interpretation
○ Different choices can lead to very different analyses and interpretations
○ Break down complexity by engaging with intermediate results
○ Tools should be transparent about transformations, show intermediate results
● Hidden choices, hidden assumptions
○ Even if you didn’t consider a certain transformation you still made a choice!
○ All transformations you don’t consider explicitly, are implicit decisions, either that they are
irrelevant, or that they shouldn’t be done!
Documenting Research Process
Tracing decisions and activities
Documenting Research Process
● Document process steps
○ Facilitates collaboration, review, reuse
● Research journals
○ Similar to Digital Tool Criticism workshops
● Tools that support process documentation
○ Open Refine (http://openrefine.org/)
■ Interaction history
○ Jupyter notebooks (https://jupyter.org/)
■ Mix program code with narrative and visualizations
● Layered publications
○ Narrative, process, data
○ Data stories: https://stories.triply.cc/netwerk-maritieme-bronnen/
Data Has No Memory
● Through linking, our review dataset now has complete set of ISBNs
○ Allows comparing reviews of different editions of a book
○ E.g. does plain edition affect readers differently from critical edition?
● Each edition has own ISBN
○ Modeling: group reviews by ISBN, group ISBNs by title+author or NTSC
Data Has No Memory
● Through linking, our review dataset now has complete set of ISBNs
○ Allows comparing reviews of different editions of a book
○ E.g. does plain edition affect readers differently from critical edition?
● Each edition has own ISBN
○ Modeling: group reviews by ISBN, group ISBNs by title+author or NTSC
● Banana peel: we’ve hidden uncertainty!
○ Some reviews don’t specify ISBN (we looked them up separately)
○ So we don’t know which edition is reviewed
○ But transformed dataset implies we do!
● Possible solution: add provenance info on data and process
Open Refine
Programming, Documentation and Narrative
● Jupyter notebooks
○ Mixing code (research process) with narrative, analysis and decision making
○ Used in many research disciplines
● Examples
○ https://nbviewer.jupyter.org/github/HoekR/MIGRANT/blob/master/results/exploring_data_integration/notebooks/migratie
_datasets_explorations_part_1.ipynb
○ https://nbviewer.jupyter.org/github/marijnkoolen/digital-history-charter-books/blob/master/Preprocess-OHZ-charter-page
s.ipynb
Concluding
Hobby Horses
Humanities scholar at the CLARIAH Toogdag:
“Our students are too stupid to write queries in a structured query
language!”
Wrap Up
● Pragmatic approach to discuss transparency in DH research and infrastructure
○ Digital Tool Criticism: Reflection, checklist + questions
○ Data Scopes: Understanding data transformations in research process
○ Document Research Practices: Data has no memory
● Infrastructure should
○ Invite us to collaborate, experiment, question, reflect
○ Reveal and document transformations
● Workshops to incorporate into methodology (research practice and teaching)
○ Data Scopes 2019 (at HuC, in September)
○ Documenting Research Practices (DH Benelux 2019, in September)
● A lot of this work is a collaboration with:
○ Rik Hoekstra
○ Jasmijn van Gorp
○ Jacco van Ossenbruggen
○ Antske Fokkens
○ Liliana Melgar
○ Peter Boot
○ Ronald Haentjens Dekker
○ Marijke van Faassen
○ Lodewijk Petram
○ Jelle van Lottum
○ Marieke van Erp
○ Adina Nerghes
○ Melvin Webers
Acknowledgements
Thank You!
Questions?
Slides: http://bit.ly/HuC-2019-Hobby-Horses
References
Anderson, S., Blanke, T. and Dunn, S., 2010. Methodological commons: arts and humanities e-Science fundamentals. Philosophical Transactions of
the Royal Society A: Mathematical, Physical and Engineering Sciences, 368(1925), pp.3779-3796.
Boot, P. 2017. A Database of Online Book Response and the Nature of the Literary Thriller. In: Digital Humanities 2017, Montreal, Conference
abstracts.
Burke, T. 2011. How I Talk About Searching, Discovery and Research in Courses. May 9, 2011.
Da, N.Z. 2019. The Computational Case against Computational Literary Studies. Critical Inquiry 45:3, pp. 601-639
Drabenstott, K.M., 2001. Web Search Strategy Development. Online, 25(4), pp.18-25.
Fickers, F. 2012. Towards a New Digital Historicism? Doing History in the Age of Abundance. View journal, volume 1 (1).
http://orbilu.uni.lu/bitstream/10993/7615/1/4-4-1-PB.pdf
Hitchcock, T. 2013. Confronting the Digital - Or How Academic History Writing Lost the Plot. Cultural and Social History, Volume 10, Issue 1, pp.
9-23. https://doi.org/10.2752/147800413X13515292098070
Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods: A Journal of Quantitative and Interdisciplinary History,
Volume 51 (2), 2018.
Koolen, M., J. van Gorp, J. van Ossenbruggen. 2018. Lessons Learned from a Digital Tool Criticism Workshop. Digital Humanities in the Benelux
2018 Conference.
Putnam L. 2016. The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast. American Historical Review, Volume
121, Number 2, pp. 377-402.
Sloman, S. A. & Fernbach, P. M. (2017). The Knowledge Illusion: Why We Never Think Alone. Riverhead Books: New York.
Unsworth, J., 2000, May. Scholarly primitives: What methods do humanities researchers have in common, and how might our tools reflect this. In
Symposium on Humanities Computing: Formal Methods, Experimental Practice. King’s College, London (Vol. 13, pp. 5-00).
Vakkari, P. 2016. Searching as Learning: A systematization based on literature. Journal of Information Science, 42(1) 2016, pp. 7-18.
Yakel, E., 2010. Searching and seeking in the deep web: Primary sources on the internet. Working in the archives: Practical research methods for
rhetoric and composition, pp.102-118.
References

More Related Content

Similar to Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-and-infrastructure

Lessons Learned from a Digital Tool Criticism Workshop
Lessons Learned from a Digital Tool Criticism WorkshopLessons Learned from a Digital Tool Criticism Workshop
Lessons Learned from a Digital Tool Criticism Workshop
Marijn Koolen
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
羽祈 張
 
The Joy of Docs, or, Technical Writing for Developers and Engineers
The Joy of Docs, or, Technical Writing for Developers and EngineersThe Joy of Docs, or, Technical Writing for Developers and Engineers
The Joy of Docs, or, Technical Writing for Developers and Engineers
Pronovix
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
National Information Standards Organization (NISO)
 
6 Academic Research Paper Writing Tips - 2023.pdf
6 Academic Research Paper Writing Tips - 2023.pdf6 Academic Research Paper Writing Tips - 2023.pdf
6 Academic Research Paper Writing Tips - 2023.pdf
IFERP
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
NASIG
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Sergey Sosnovsky
 
Digital literacy for the teaching and learning of languages
Digital literacy for the teaching and learning of languages Digital literacy for the teaching and learning of languages
Digital literacy for the teaching and learning of languages
catherine_jeanneau
 
Trend Spotting Workshop
Trend Spotting WorkshopTrend Spotting Workshop
Trend Spotting Workshop
Marieke Guy
 
Data-Informed Decision Making for Libraries - Athenaeum21
Data-Informed Decision Making for Libraries - Athenaeum21Data-Informed Decision Making for Libraries - Athenaeum21
Data-Informed Decision Making for Libraries - Athenaeum21
Megan Hurst
 
Data-Informed Decision Making for Digital Resources
Data-Informed Decision Making for Digital ResourcesData-Informed Decision Making for Digital Resources
Data-Informed Decision Making for Digital Resources
Christine Madsen
 
The Open Education Handbook
The Open Education HandbookThe Open Education Handbook
The Open Education Handbook
Marieke Guy
 
Research Methods in UX
Research Methods in UXResearch Methods in UX
Research Methods in UX
Brad Orego (he/they)
 
Hawkins "Monitoring Usage of Open Access Long-Form Content"
Hawkins "Monitoring Usage of Open Access Long-Form Content"Hawkins "Monitoring Usage of Open Access Long-Form Content"
Hawkins "Monitoring Usage of Open Access Long-Form Content"
National Information Standards Organization (NISO)
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
Alex Rayón Jerez
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
QQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILSQQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILSA. M. Kelleher
 
Managing & Maximizing Your Scholarly Impact
Managing & Maximizing Your Scholarly ImpactManaging & Maximizing Your Scholarly Impact
Managing & Maximizing Your Scholarly Impact
UC Berkeley Office of Scholarly Communication Services
 
The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
Alejandra Gonzalez-Beltran
 
Tools and Methodology for Research: Article Reading
Tools and Methodology for Research: Article ReadingTools and Methodology for Research: Article Reading
Tools and Methodology for Research: Article Reading
Yannick Prié (Enseignement)
 

Similar to Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-and-infrastructure (20)

Lessons Learned from a Digital Tool Criticism Workshop
Lessons Learned from a Digital Tool Criticism WorkshopLessons Learned from a Digital Tool Criticism Workshop
Lessons Learned from a Digital Tool Criticism Workshop
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
The Joy of Docs, or, Technical Writing for Developers and Engineers
The Joy of Docs, or, Technical Writing for Developers and EngineersThe Joy of Docs, or, Technical Writing for Developers and Engineers
The Joy of Docs, or, Technical Writing for Developers and Engineers
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
 
6 Academic Research Paper Writing Tips - 2023.pdf
6 Academic Research Paper Writing Tips - 2023.pdf6 Academic Research Paper Writing Tips - 2023.pdf
6 Academic Research Paper Writing Tips - 2023.pdf
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
 
Digital literacy for the teaching and learning of languages
Digital literacy for the teaching and learning of languages Digital literacy for the teaching and learning of languages
Digital literacy for the teaching and learning of languages
 
Trend Spotting Workshop
Trend Spotting WorkshopTrend Spotting Workshop
Trend Spotting Workshop
 
Data-Informed Decision Making for Libraries - Athenaeum21
Data-Informed Decision Making for Libraries - Athenaeum21Data-Informed Decision Making for Libraries - Athenaeum21
Data-Informed Decision Making for Libraries - Athenaeum21
 
Data-Informed Decision Making for Digital Resources
Data-Informed Decision Making for Digital ResourcesData-Informed Decision Making for Digital Resources
Data-Informed Decision Making for Digital Resources
 
The Open Education Handbook
The Open Education HandbookThe Open Education Handbook
The Open Education Handbook
 
Research Methods in UX
Research Methods in UXResearch Methods in UX
Research Methods in UX
 
Hawkins "Monitoring Usage of Open Access Long-Form Content"
Hawkins "Monitoring Usage of Open Access Long-Form Content"Hawkins "Monitoring Usage of Open Access Long-Form Content"
Hawkins "Monitoring Usage of Open Access Long-Form Content"
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
 
QQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILSQQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILS
 
Managing & Maximizing Your Scholarly Impact
Managing & Maximizing Your Scholarly ImpactManaging & Maximizing Your Scholarly Impact
Managing & Maximizing Your Scholarly Impact
 
The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
 
Tools and Methodology for Research: Article Reading
Tools and Methodology for Research: Article ReadingTools and Methodology for Research: Article Reading
Tools and Methodology for Research: Article Reading
 

Recently uploaded

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 

Recently uploaded (20)

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 

Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-and-infrastructure

  • 1. Hobby Horses and Detail Devils Transparency in Digital Humanities Research and Infrastructure Marijn Koolen Team R&D - Digital Infrastructure - KNAW Humanities Cluster HuC Lecture - 28 May 2019 - IISH, Amsterdam Slide URL: http://bit.ly/HuC-2019-Hobby-Horses
  • 3. Overview Digital Tool Criticism - reflection on tool use Data Scopes - conceptual framework for data transformations Documenting Research Process - Tracing choices and decisions
  • 4. Digital Tool Criticism Critical reflection on tool use
  • 5. Tools and Assumptions ● Input data needs to be interpreted to be usable by a digital tool ○ Algorithms rely on many assumptions and expectations ○ F.i. textual tools often assume modern English text ■ Regardless of the actual language of your data ○ If you use it, you (implicitly) assume that, for your research purpose, Dutch has the same characteristics as English ● Which assumptions should we be aware of?
  • 6. Developing Methods for Digital Tool Criticism ● Workshops ○ Tool Crit. 2015, ○ DH Benelux 2017 & 2018 ○ Research master Media, Art and Performance Studies - Utrecht University ● Experiments in using tools to ○ Develop research questions ○ Analyse online datasets ○ Reproduce published research ● Work in small groups ○ Keep a collaborative research journal ○ Reflect on process through journal
  • 7. Visualizing Research Journeys We analysed the research journals of the participant groups Colour-coded the notes based on 5 aspects: Research question Method Tool Dataset Reflection (hard to read: it’s yellow and says “Reflection”)
  • 10. Research DNA Visualizations Koolen, van Gorp, van Ossenbruggen 2018
  • 11. ● Participants liked Collaboration and Experimentation ● Collaboratively using tools prompts discussions ○ Face-to-face: collaboratively looking under the hood and its consequences ○ Explaining how you think it works is a great way to bring out gaps in your own understanding (Sloman and Fernbach 2017) ● Many research questions require huge number of skills ○ Need to collaborate to ensure at least someone involved understands specific tool details ● Experimentation with tools to deepen understanding ○ Compare intermediate output with input -> What has changed? What has disappeared? ○ Try different settings and compare intermediate output -> What is different? Findings on the Workshop Format
  • 12. ● Effective elements of the workshop format ○ Answer series of questions on tool and data: ■ Who made them, when, why, what for, with what assumptions? ■ Similar to source criticism ○ Focus on integrative reflection ■ Need to critically reflect on tools in combination with other elements of research design ● Research questions ● Methods ● Digital tools ● Digital data Lessons on Tool Criticism
  • 13. Model: Reflection as Integrative Practice An interactive model of digital tool criticism, where reflection integrates the four concepts of research questions, methods, data and tools as interactive and interdependent parts of the research process (Koolen, van Gorp & van Ossenbruggen 2019)
  • 14. Entanglement of Data and Tools
  • 15. Entanglement of Data and Tools Each step changes the underlying data!
  • 16. Lessons on Tool Criticism ● To what extent should we understand tool details? ○ At the level of data transformations (echoing Ben Schmidt 2016) ○ And how does that change our interpretation? ● To what extent can we develop tools and interfaces that support this? ○ Prioritize documentation ○ Build in elements that encourage reflection
  • 17. Search Engines: Experts in Retrieval, Masters in Hiding
  • 18.
  • 19.
  • 20. ● Inspection tool in CLARIAH Media Suite is first attempt ○ What are other ways to identify and flag issues in data and tools? ○ Input from researchers and developers needed! ● How do/can other often used search interfaces deal with transparency? ○ Nederlab ○ Delpher ○ WorldCat ○ Pica ○ Google Reflection and Transparency in Tool Interfaces
  • 21. ● List of recommendations (Koolen, van Gorp, van Ossenbruggen - DSH 2018) ● For researchers: ○ Incorporate digital source, data and tool criticism in research process ○ Explicitly ask and answer questions about assumptions, choices, limitations ■ Document and share workarounds ○ Develop method of experimentation with tool to test functioning ○ Document research process ● For tool developers and data providers (and researchers sharing datasets): ○ Add an “About” page and documentation on functionalities ○ Design UIs so as to encourage reflection! ○ Describe selection criteria and transformations of data sets Recommendations
  • 22. Taking Up Recommendations Framework for thinking about data transformations Support for documenting research process
  • 23. Data Scopes Conceptual framework for data transformations
  • 24. ● Data needs processing to offer insights for research questions ○ Making data transformations offers different perspectives or scopes on data ■ Often left out of publications, outsourced as “technical detail” ■ But details matter, process is intellectual effort! ● Data Scopes concept (Hoekstra and Koolen 2018) ○ Framework for thinking and communicating about research data processing ○ Especially for combining data from different sources Data Scopes
  • 25. Data Scopes ● Data needs processing to offer insights for research questions ○ Making data transformations offers different perspectives or scopes on data ■ Often left out of publications, outsourced as “technical detail” ■ But details matter, process is intellectual effort! ● Data Scopes concept (Hoekstra and Koolen 2018) ○ Framework for thinking and communicating about research data processing ○ Especially for combining data from different sources ● Five types of transforming activities: ○ Selecting ○ Modeling ○ Normalizing ○ Linking ○ Classifying ● A form of scholarly primitives (Unsworth 2000, Anderson et al. 2010)
  • 26.
  • 27. ● Online book response corpus (Boot 2017) ○ ~400,000 book reviews in Dutch ○ From different review sites (Bol, Hebban, Dizzie, Wat Lees Jij Nu, …) ● Research questions ○ What impact does reading fiction have on readers? ■ How do reviewers describe impact of book? ■ Are there differences across genres/authors? Use Case 1: Analyzing Online Book Reviews
  • 28. ● Pay ledgers of VOC personnel ○ 774,200 contracts between VOC and individual persons ○ Career transitions: two subsequent contracts of the same person ● Research questions ○ What are typical career paths for VOC personnel? ○ Do migrants have different careers or chances of promotion than non-migrants? Use Case 2: VOC Maritime Careers
  • 29.
  • 30.
  • 31.
  • 32. Reading Impact “Je gaat Stijn eigenlijk een beetje begrijpen, …” “You start to understand Stijn, …” “Helaas is de schrijfstijl bedroevend, presenteert Kluun zich als een pseudo-intellectueel …” “Unfortunately the writing style is pathetic, Kluun presents himself as a pseudo-intellectual …”
  • 33. Reading Impact Rules ● 349 rules ○ Identifying 4 types of impact: general, narrative, style, reflection ● Term: boeiend ○ Rule 2: Style impact: boeiend + style term (“boeiend taalgebruik”) ○ Rule 3: Reflection: boeiend + topic term (“boeiende thematiek”) ● Phrase: in één ruk * uitlezen ○ Rule 79: General impact (“Ik heb het boek in één ruk helemaal uitgelezen.”) ○ Many variants ■ één/een/1 ■ adem/avond/dag/keer/middag/ruk/stuk/zucht/... ■ uitlezen/uit
  • 34. ● Gather review data ○ Select review sites ○ Model review from web page (book title, author, ISBN, reviewer, date, rating, website, …) ○ Link author and book title to WorldCat record (for missing data) ■ Select ISBN, publisher, publication year ○ Link ISBN to record in boek.nl database ■ Select genre classification (NUR code) Data Scopes for Reading Impact Analysis (1/2)
  • 35. Data Scopes for Reading Impact Analysis (2/2) ● Extract impact expressions ○ Select individual sentences from book reviews ○ Normalise words in sentences to their lemmas ○ Select all sentences that match an impact rule ○ Classify sentences by impact rule ● Analyse impact ○ Select impact matches by book genre or author or book ID or reviewer ID
  • 36. ● Extract domain specific sentiment lexicon ○ Compare positive and negative reviews ● Intended selection ○ Positive: 4+5 star reviews ○ Negative: 1+2 star reviews Intended and Unintended Selection
  • 37. ● Extract domain specific sentiment lexicon ○ Compare positive and negative reviews ● Intended selection ○ Positive: 4+5 star reviews ○ Negative: 1+2 star reviews ● Unintended selection ○ Bol.com and Hebban.nl: all reviews have a star rating ○ Leestafel.info: none of the reviews have a rating ○ Selections exclude all Leestafel.info reviews ○ Consequence: Leestafel.info reviews not represented in sentiment lexicon! Intended and Unintended Selection
  • 38. Intended and Unintended Selection ● Extract domain specific sentiment lexicon ○ Compare positive and negative reviews ● Intended selection ○ Positive: 4+5 star reviews ○ Negative: 1+2 star reviews ● Unintended selection ○ Bol.com and Hebban.nl: all reviews have a star rating ○ Leestafel.info: none of the reviews have a rating ○ Selections exclude all Leestafel.info reviews ○ Consequence: Leestafel.info reviews not represented in sentiment lexicon! ● Selection choices lead to side effects!
  • 39. Modelling and Consequences ● Review text ○ Extract text only, ignore images and emoji’s ● Book identifier ○ To group reviews of same book ○ ISBN? Not always present, each version has own ISBN ● Dates: ○ Underspecified dates cause undefined behaviour when sorting by date ● Ratings: ○ some review sites have 5-star rating system, some allow half stars ○ mixing different rating systems results in odd distributions ■ How do you compare 4-star reviews to 5-star reviews?
  • 40. Linking ● Add missing data via external sources ○ Missing ISBN, publisher and publication date info ● Add contextual data for interpretation ○ Genre/subject classification
  • 41.
  • 42.
  • 43.
  • 44. Classification ● Reduce complexity by grouping on common characteristics ● Boundaries are often arbitrary ○ NUR code: Genre/subject classification (based on Dewey Decimal Classification) ■ 302 - Translated literary novel ■ 305 - Literary Thriller ■ 331 - Detective ■ 342 - Historical novel ○ Different editions of same book can get different classifications!
  • 45. Maritime Career Data Scopes ● Selection: VOC pay ledgers between 1680 and 1794 ○ Some ledgers are missing, before 1680 most are missing ● Modeling: two pay ledgers mention same person if ○ Entries have similar full names ○ Same place of origin ○ Gap between subsequent contracts is less than 6 years ○ The person didn’t die during the first contract period ● Normalizing: modernize and standardize spelling of names and ranks ● Classification: assign entry to person ID, rank to Occupation classification ● Linking: place of origin to Geo database ○ Is place of origin within or outside the Dutch Republic? ○ Because we’re interested in migrant vs. non-migrant
  • 46. Normalizing Person and Place Names
  • 47.
  • 48.
  • 49. Career Transitions - Ship Gunner
  • 50. Career Transitions - Ship Gunner
  • 51. Career Transitions - First Corporal
  • 52. Career Transitions - Intermediate Results
  • 53.
  • 54. ● Many phenomena in data have skewed distribution ○ Few high frequent, many low frequent: long tail ○ Descriptive stats like mean/average are not very useful ● Appear everywhere ○ Maritime data: 197 distinct ranks, top 2 (or 1%) cover ~50% of the data ○ Book reviews: >33,000 authors, top 330 (1%) cover ~37% of the data Data Distributions
  • 55. Data Distributions - Long Tails and Analysis ● Easy to focus on head of distribution: small set of most frequent ○ But they are not representative, as the vast majority is different ○ But long tail has too many to analyze in detail ● Use classification to group low frequent items ○ Group ranks by type and level: naval/military/craftsmen, first/second/third ○ Group book/authors by genre ○ But usually same problem reappears: few large groups, many small groups ○ Variance within large groups is bigger than between groups
  • 56. Wrap Data Scopes ● This process is not “mere preparation” ○ but part of the “real research” ● Process is complex, takes intellectual effort ○ Requires both technical and domain knowledge and interpretation ○ Different choices can lead to very different analyses and interpretations ○ Break down complexity by engaging with intermediate results ○ Tools should be transparent about transformations, show intermediate results
  • 57. Wrap Data Scopes ● This process is not “mere preparation” ○ but part of the “real research” ● Process is complex, takes intellectual effort ○ Requires both technical and domain knowledge and interpretation ○ Different choices can lead to very different analyses and interpretations ○ Break down complexity by engaging with intermediate results ○ Tools should be transparent about transformations, show intermediate results ● Hidden choices, hidden assumptions ○ Even if you didn’t consider a certain transformation you still made a choice! ○ All transformations you don’t consider explicitly, are implicit decisions, either that they are irrelevant, or that they shouldn’t be done!
  • 58. Documenting Research Process Tracing decisions and activities
  • 59. Documenting Research Process ● Document process steps ○ Facilitates collaboration, review, reuse ● Research journals ○ Similar to Digital Tool Criticism workshops ● Tools that support process documentation ○ Open Refine (http://openrefine.org/) ■ Interaction history ○ Jupyter notebooks (https://jupyter.org/) ■ Mix program code with narrative and visualizations ● Layered publications ○ Narrative, process, data ○ Data stories: https://stories.triply.cc/netwerk-maritieme-bronnen/
  • 60. Data Has No Memory ● Through linking, our review dataset now has complete set of ISBNs ○ Allows comparing reviews of different editions of a book ○ E.g. does plain edition affect readers differently from critical edition? ● Each edition has own ISBN ○ Modeling: group reviews by ISBN, group ISBNs by title+author or NTSC
  • 61. Data Has No Memory ● Through linking, our review dataset now has complete set of ISBNs ○ Allows comparing reviews of different editions of a book ○ E.g. does plain edition affect readers differently from critical edition? ● Each edition has own ISBN ○ Modeling: group reviews by ISBN, group ISBNs by title+author or NTSC ● Banana peel: we’ve hidden uncertainty! ○ Some reviews don’t specify ISBN (we looked them up separately) ○ So we don’t know which edition is reviewed ○ But transformed dataset implies we do! ● Possible solution: add provenance info on data and process
  • 62.
  • 63.
  • 65. Programming, Documentation and Narrative ● Jupyter notebooks ○ Mixing code (research process) with narrative, analysis and decision making ○ Used in many research disciplines ● Examples ○ https://nbviewer.jupyter.org/github/HoekR/MIGRANT/blob/master/results/exploring_data_integration/notebooks/migratie _datasets_explorations_part_1.ipynb ○ https://nbviewer.jupyter.org/github/marijnkoolen/digital-history-charter-books/blob/master/Preprocess-OHZ-charter-page s.ipynb
  • 66.
  • 67.
  • 69. Hobby Horses Humanities scholar at the CLARIAH Toogdag: “Our students are too stupid to write queries in a structured query language!”
  • 70. Wrap Up ● Pragmatic approach to discuss transparency in DH research and infrastructure ○ Digital Tool Criticism: Reflection, checklist + questions ○ Data Scopes: Understanding data transformations in research process ○ Document Research Practices: Data has no memory ● Infrastructure should ○ Invite us to collaborate, experiment, question, reflect ○ Reveal and document transformations ● Workshops to incorporate into methodology (research practice and teaching) ○ Data Scopes 2019 (at HuC, in September) ○ Documenting Research Practices (DH Benelux 2019, in September)
  • 71. ● A lot of this work is a collaboration with: ○ Rik Hoekstra ○ Jasmijn van Gorp ○ Jacco van Ossenbruggen ○ Antske Fokkens ○ Liliana Melgar ○ Peter Boot ○ Ronald Haentjens Dekker ○ Marijke van Faassen ○ Lodewijk Petram ○ Jelle van Lottum ○ Marieke van Erp ○ Adina Nerghes ○ Melvin Webers Acknowledgements
  • 73. References Anderson, S., Blanke, T. and Dunn, S., 2010. Methodological commons: arts and humanities e-Science fundamentals. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 368(1925), pp.3779-3796. Boot, P. 2017. A Database of Online Book Response and the Nature of the Literary Thriller. In: Digital Humanities 2017, Montreal, Conference abstracts. Burke, T. 2011. How I Talk About Searching, Discovery and Research in Courses. May 9, 2011. Da, N.Z. 2019. The Computational Case against Computational Literary Studies. Critical Inquiry 45:3, pp. 601-639 Drabenstott, K.M., 2001. Web Search Strategy Development. Online, 25(4), pp.18-25. Fickers, F. 2012. Towards a New Digital Historicism? Doing History in the Age of Abundance. View journal, volume 1 (1). http://orbilu.uni.lu/bitstream/10993/7615/1/4-4-1-PB.pdf Hitchcock, T. 2013. Confronting the Digital - Or How Academic History Writing Lost the Plot. Cultural and Social History, Volume 10, Issue 1, pp. 9-23. https://doi.org/10.2752/147800413X13515292098070 Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods: A Journal of Quantitative and Interdisciplinary History, Volume 51 (2), 2018.
  • 74. Koolen, M., J. van Gorp, J. van Ossenbruggen. 2018. Lessons Learned from a Digital Tool Criticism Workshop. Digital Humanities in the Benelux 2018 Conference. Putnam L. 2016. The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast. American Historical Review, Volume 121, Number 2, pp. 377-402. Sloman, S. A. & Fernbach, P. M. (2017). The Knowledge Illusion: Why We Never Think Alone. Riverhead Books: New York. Unsworth, J., 2000, May. Scholarly primitives: What methods do humanities researchers have in common, and how might our tools reflect this. In Symposium on Humanities Computing: Formal Methods, Experimental Practice. King’s College, London (Vol. 13, pp. 5-00). Vakkari, P. 2016. Searching as Learning: A systematization based on literature. Journal of Information Science, 42(1) 2016, pp. 7-18. Yakel, E., 2010. Searching and seeking in the deep web: Primary sources on the internet. Working in the archives: Practical research methods for rhetoric and composition, pp.102-118. References