The Humanities Cluster invests a lot of effort in developing infrastructure and tools for digital research. As scholars we want those tools to be easy to use and don't want to bother with many of the technical details. But their ease of use often makes it hard to check if there is a devil in those details who we should want to meet. Digital tools can do a lot of work for us, but only because they are based on a lot of assumptions. Which of these assumptions are important to consider in research? And how can we develop infrastructure and tools that wear their assumptions on their sleeves and that invite us to reflect on their impact? In this talk I will present our research in attempting to address these questions. We have developed conceptual frameworks and techniques for digital tool criticism and evaluation and for thinking and communicating about digital data processes in research. I will discuss the lessons we have learned from bringing these frameworks and techniques into practice and how we can incorporate these lessons in digital humanities research methodology and in developing digital infrastructure.
Visual Perception of International Traffic Signs: Influence of e-Learning and...Gergely Rakoczi
Various eye movement metrics were recorded during the visual perception of international traffic signs embedded within an e-learning course designed to familiarize participants with foreign signage. Goals of the were to gauge differences in task types, sign origin, and ethnicity (American, Chinese, and Austrian) as well as effectiveness of the e-learning teaching materials in terms of prior preparation. Results, in contrast to other studies, suggest that teaching materials had no overall effect on either eye movement metrics nor on task success rates. Instead, sign origin had the strongest effect on gaze, as foreign signs in mixed presentation with domestic signs, elicited a larger number of fixations with longer mean fixation durations, highest regression rates, and lower performance scores. Possible effects of ethnicity were also noted: Americans showed lower mean fixation durations over the entire experiment, independent of test conditions, with Chinese participants fixating faster on (correct) road signs than the other ethnic groups.
Application of Eye Tracking Technology as a Self-Evaluation Tool in the Train...Gergely Rakoczi
Future online tutors are facing an increasing need of using interactive tools in their everyday tutoring tasks. The online moderation and communication processes hereby are becoming more and more demanding. Many training programs of online tutors therefore include special training to exercise personal skills. Examples for these trainings are exercises with web-conferencing tools. Feedback hereby is mainly given by the trainers based on their observations and (if at all) in form of simple screen recordings. The potentials of self-evaluation are often limited. The ongoing case study of this paper tries to improve feedback by introducing eye tracking technology. This paper evaluates the potentials of eye tracking whether the provided feedback for training of personal skills might be enriched with gaze data. The paper investigates whether gaze replays, visualizations and statistics of eye movements might offer more comprehensive feedback to online tutors enabling improvement of their skills.
Tools that Encourage Criticism - Leiden University Symposium on Tools CriticismMarijn Koolen
The use of research tools in digital humanities requires critical reflection by the researcher, but also by developers of tools and research infrastructure.
A hands-on approach to digital tool criticism: Tools for (self-)reflectionMarijn Koolen
Digital tool criticism is a recent and important discussion in Digital Humanities research. We define digital tool criticism as the reflection on the role of digital tools in the research methodology and the evaluation of the suitability of a given digital tool for a specific research goal. The aim is to understand the impact of any limitation of the tool on the specific goal, not to improve a tool’s performance. That is, ensuring as a scholar to be aware of the impact of a tool on research design, methods, interpretations and outcomes. Our goal with developing digital tool criticism as a method is to help scholars better understand how research methods, tools and activities shape our interpretations. Based on our experiences with two hands-on workshops on digital tool criticism, we find that reflection on using digital tools and data in all phases of the research process is key.
Reflection urges scholars to consider digital data and tools as part of the overall research goals and design, and interdependent with other elements of research design, namely research questions and methods. As scholars go through their research process, assumptions on the research design and the connection between tools, data and questions are constantly challenged, forcing updates in the design and the interpretation of data and question.
Data Scopes - Towards transparent data research in digital humanities (Digita...Marijn Koolen
Data scopes describe the process of data gathering, cleaning and combining in digital humanities research, which is too often considered as mere preparation that is not part of research, and is mostly not described in scholarly communications. We argue that scholars need to be more aware of the intellectual effort of this process and make it more transparent
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...Rachel Vacek
In an effort to assess user experience and satisfaction with searching the University of Michigan Library catalog, we developed an online data collection tool that captured both data on user searches and their reports on various aspects of the search experience. We successfully piloted the tool, demonstrating both the usefulness of the assessment data and the readiness of the tool for use with a larger group of campus stakeholders. We focus in this paper on the features and deployment of the data collection tool, and we also discuss our pilot phase findings and our plan to use the tool in future assessment work.
Visual Perception of International Traffic Signs: Influence of e-Learning and...Gergely Rakoczi
Various eye movement metrics were recorded during the visual perception of international traffic signs embedded within an e-learning course designed to familiarize participants with foreign signage. Goals of the were to gauge differences in task types, sign origin, and ethnicity (American, Chinese, and Austrian) as well as effectiveness of the e-learning teaching materials in terms of prior preparation. Results, in contrast to other studies, suggest that teaching materials had no overall effect on either eye movement metrics nor on task success rates. Instead, sign origin had the strongest effect on gaze, as foreign signs in mixed presentation with domestic signs, elicited a larger number of fixations with longer mean fixation durations, highest regression rates, and lower performance scores. Possible effects of ethnicity were also noted: Americans showed lower mean fixation durations over the entire experiment, independent of test conditions, with Chinese participants fixating faster on (correct) road signs than the other ethnic groups.
Application of Eye Tracking Technology as a Self-Evaluation Tool in the Train...Gergely Rakoczi
Future online tutors are facing an increasing need of using interactive tools in their everyday tutoring tasks. The online moderation and communication processes hereby are becoming more and more demanding. Many training programs of online tutors therefore include special training to exercise personal skills. Examples for these trainings are exercises with web-conferencing tools. Feedback hereby is mainly given by the trainers based on their observations and (if at all) in form of simple screen recordings. The potentials of self-evaluation are often limited. The ongoing case study of this paper tries to improve feedback by introducing eye tracking technology. This paper evaluates the potentials of eye tracking whether the provided feedback for training of personal skills might be enriched with gaze data. The paper investigates whether gaze replays, visualizations and statistics of eye movements might offer more comprehensive feedback to online tutors enabling improvement of their skills.
Tools that Encourage Criticism - Leiden University Symposium on Tools CriticismMarijn Koolen
The use of research tools in digital humanities requires critical reflection by the researcher, but also by developers of tools and research infrastructure.
A hands-on approach to digital tool criticism: Tools for (self-)reflectionMarijn Koolen
Digital tool criticism is a recent and important discussion in Digital Humanities research. We define digital tool criticism as the reflection on the role of digital tools in the research methodology and the evaluation of the suitability of a given digital tool for a specific research goal. The aim is to understand the impact of any limitation of the tool on the specific goal, not to improve a tool’s performance. That is, ensuring as a scholar to be aware of the impact of a tool on research design, methods, interpretations and outcomes. Our goal with developing digital tool criticism as a method is to help scholars better understand how research methods, tools and activities shape our interpretations. Based on our experiences with two hands-on workshops on digital tool criticism, we find that reflection on using digital tools and data in all phases of the research process is key.
Reflection urges scholars to consider digital data and tools as part of the overall research goals and design, and interdependent with other elements of research design, namely research questions and methods. As scholars go through their research process, assumptions on the research design and the connection between tools, data and questions are constantly challenged, forcing updates in the design and the interpretation of data and question.
Data Scopes - Towards transparent data research in digital humanities (Digita...Marijn Koolen
Data scopes describe the process of data gathering, cleaning and combining in digital humanities research, which is too often considered as mere preparation that is not part of research, and is mostly not described in scholarly communications. We argue that scholars need to be more aware of the intellectual effort of this process and make it more transparent
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...Rachel Vacek
In an effort to assess user experience and satisfaction with searching the University of Michigan Library catalog, we developed an online data collection tool that captured both data on user searches and their reports on various aspects of the search experience. We successfully piloted the tool, demonstrating both the usefulness of the assessment data and the readiness of the tool for use with a larger group of campus stakeholders. We focus in this paper on the features and deployment of the data collection tool, and we also discuss our pilot phase findings and our plan to use the tool in future assessment work.
The Joy of Docs, or, Technical Writing for Developers and EngineersPronovix
Technical writing is rarely taught in code schools or listed among the qualifications for an engineering job. Yet developers rely heavily on documentation to use the tools they need to use. Not only that, decisions about which tool to use often take into account the availability of quality documentation. It’s important not to leave the task of writing docs to technical writers who may not fully understand the underlying technology or use cases. This talk will teach you the basics of technical writing–from tutorials and API reference guides to pull requests, release notes, and even code comments - so you can increase adoption of your software and keep the developers using it happy.
This presentation was provided by Starr Hoffman of Director, Planning & Assessment, University of Nevada – Las Vegas during the NISO event, NISO Training Series: Assessment Practices and Metrics for the 21st Century, held on Friday, October 26, 2018.
6 Academic Research Paper Writing Tips - 2023.pdfIFERP
Writing an academic research paper frequently necessitates some level of academic research writing experience and prowess. Both aspiring research authors and research authors who have previously struggled with academic writing will find the six tips highlighted in this blog to provide a wealth of information about what steps they can take to make their next research writing endeavor a pleasant one. Visit https://www.iferp.in/blog/2022/08/10/6-tips-for-how-to-write-academic-research-paper/ for more information.
Analyzing workflows and improving communication across departments NASIG
Presented by Jharina Pascual and Sarah Wallbank.
The presentation provides people with simple techniques for analyzing their local workflow and information-sharing practices, some ideas for interrogating and improving intra-technical services communication, and ideas for simple changes that can improve communication and build a sense of community/joint purpose within or across departments.
Trend Spotting Workshop. A practical guide to making sense of large information sources. Workshop run with Gemma Long (QAA) at etc.venues Maple House, Birmingham, 23rd February 2017.
Data-Informed Decision Making for Libraries - Athenaeum21Megan Hurst
Athenaeum21 presents three case studies of assessment and evaluation programs in libraries--one past, one current, and one future. The cases use three different modes of data gathering and analysis to show the power of understanding user needs and how well your organization is meeting them.
Data-Informed Decision Making for Digital ResourcesChristine Madsen
This session will provide three case studies of assessment and evaluation programs in libraries--one past, one current, and one future. The cases use three different modes of data gathering and analysis and show the power of understanding user needs and how well your organization is meeting them.
Presentation and demo given at Open Data in Education Seminar, St Petersburg, 10th March 2014: http://linkededucation.org/events/open-data-in-education-seminar-st-petersburg
This presentation was provided by Kevin Hawkins of The University of North Texas Libraries, during the NISO event "Long Form Content: Ebooks, Print Volumes and the Concerns of Those Who Use Both," held on March 20, 2019.
Narrative-Driven Recommendation for Casual Leisure NeedsMarijn Koolen
Many information needs for leisure (book, films, games, music) are highly complex and cover many different relevance aspects. This is an investigation into the nature of human-directed, natural language statements of casual leisure needs across four domains, and a discussion of their implications of conversational search and recommendation systems.
This is a workshop aimed at graduate students and early career researchers to provide practical strategies and tips for promoting scholarship, increasing citations, and monitoring success. It explores how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.
It was delivered over Zoom on 20 October 2020.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
The Joy of Docs, or, Technical Writing for Developers and EngineersPronovix
Technical writing is rarely taught in code schools or listed among the qualifications for an engineering job. Yet developers rely heavily on documentation to use the tools they need to use. Not only that, decisions about which tool to use often take into account the availability of quality documentation. It’s important not to leave the task of writing docs to technical writers who may not fully understand the underlying technology or use cases. This talk will teach you the basics of technical writing–from tutorials and API reference guides to pull requests, release notes, and even code comments - so you can increase adoption of your software and keep the developers using it happy.
This presentation was provided by Starr Hoffman of Director, Planning & Assessment, University of Nevada – Las Vegas during the NISO event, NISO Training Series: Assessment Practices and Metrics for the 21st Century, held on Friday, October 26, 2018.
6 Academic Research Paper Writing Tips - 2023.pdfIFERP
Writing an academic research paper frequently necessitates some level of academic research writing experience and prowess. Both aspiring research authors and research authors who have previously struggled with academic writing will find the six tips highlighted in this blog to provide a wealth of information about what steps they can take to make their next research writing endeavor a pleasant one. Visit https://www.iferp.in/blog/2022/08/10/6-tips-for-how-to-write-academic-research-paper/ for more information.
Analyzing workflows and improving communication across departments NASIG
Presented by Jharina Pascual and Sarah Wallbank.
The presentation provides people with simple techniques for analyzing their local workflow and information-sharing practices, some ideas for interrogating and improving intra-technical services communication, and ideas for simple changes that can improve communication and build a sense of community/joint purpose within or across departments.
Trend Spotting Workshop. A practical guide to making sense of large information sources. Workshop run with Gemma Long (QAA) at etc.venues Maple House, Birmingham, 23rd February 2017.
Data-Informed Decision Making for Libraries - Athenaeum21Megan Hurst
Athenaeum21 presents three case studies of assessment and evaluation programs in libraries--one past, one current, and one future. The cases use three different modes of data gathering and analysis to show the power of understanding user needs and how well your organization is meeting them.
Data-Informed Decision Making for Digital ResourcesChristine Madsen
This session will provide three case studies of assessment and evaluation programs in libraries--one past, one current, and one future. The cases use three different modes of data gathering and analysis and show the power of understanding user needs and how well your organization is meeting them.
Presentation and demo given at Open Data in Education Seminar, St Petersburg, 10th March 2014: http://linkededucation.org/events/open-data-in-education-seminar-st-petersburg
This presentation was provided by Kevin Hawkins of The University of North Texas Libraries, during the NISO event "Long Form Content: Ebooks, Print Volumes and the Concerns of Those Who Use Both," held on March 20, 2019.
Narrative-Driven Recommendation for Casual Leisure NeedsMarijn Koolen
Many information needs for leisure (book, films, games, music) are highly complex and cover many different relevance aspects. This is an investigation into the nature of human-directed, natural language statements of casual leisure needs across four domains, and a discussion of their implications of conversational search and recommendation systems.
This is a workshop aimed at graduate students and early career researchers to provide practical strategies and tips for promoting scholarship, increasing citations, and monitoring success. It explores how to understand metrics, use scholarly networking tools, evaluate journals and publishing options, and take advantage of funding opportunities for Open Access scholarship.
It was delivered over Zoom on 20 October 2020.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
1. Hobby Horses and Detail Devils
Transparency in Digital Humanities Research and
Infrastructure
Marijn Koolen
Team R&D - Digital Infrastructure - KNAW Humanities Cluster
HuC Lecture - 28 May 2019 - IISH, Amsterdam
Slide URL: http://bit.ly/HuC-2019-Hobby-Horses
3. Overview
Digital Tool Criticism - reflection on tool use
Data Scopes - conceptual framework for data transformations
Documenting Research Process - Tracing choices and decisions
5. Tools and Assumptions
● Input data needs to be interpreted to be usable by a digital tool
○ Algorithms rely on many assumptions and expectations
○ F.i. textual tools often assume modern English text
■ Regardless of the actual language of your data
○ If you use it, you (implicitly) assume that, for your research purpose, Dutch has the same
characteristics as English
● Which assumptions should we be aware of?
6. Developing Methods for Digital Tool Criticism
● Workshops
○ Tool Crit. 2015,
○ DH Benelux 2017 & 2018
○ Research master Media, Art and Performance Studies - Utrecht University
● Experiments in using tools to
○ Develop research questions
○ Analyse online datasets
○ Reproduce published research
● Work in small groups
○ Keep a collaborative research journal
○ Reflect on process through journal
7. Visualizing Research Journeys
We analysed the research journals of the participant groups
Colour-coded the notes based on 5 aspects:
Research question
Method
Tool
Dataset
Reflection (hard to read: it’s yellow and says “Reflection”)
11. ● Participants liked Collaboration and Experimentation
● Collaboratively using tools prompts discussions
○ Face-to-face: collaboratively looking under the hood and its consequences
○ Explaining how you think it works is a great way to bring out gaps in your own understanding
(Sloman and Fernbach 2017)
● Many research questions require huge number of skills
○ Need to collaborate to ensure at least someone involved understands specific tool details
● Experimentation with tools to deepen understanding
○ Compare intermediate output with input -> What has changed? What has disappeared?
○ Try different settings and compare intermediate output -> What is different?
Findings on the Workshop Format
12. ● Effective elements of the workshop format
○ Answer series of questions on tool and data:
■ Who made them, when, why, what for, with what assumptions?
■ Similar to source criticism
○ Focus on integrative reflection
■ Need to critically reflect on tools in combination with other elements of research design
● Research questions
● Methods
● Digital tools
● Digital data
Lessons on Tool Criticism
13. Model: Reflection as Integrative Practice
An interactive model of digital tool criticism, where reflection integrates the four concepts of research
questions, methods, data and tools as interactive and interdependent parts of the research process
(Koolen, van Gorp & van Ossenbruggen 2019)
16. Lessons on Tool Criticism
● To what extent should we understand tool details?
○ At the level of data transformations (echoing Ben Schmidt 2016)
○ And how does that change our interpretation?
● To what extent can we develop tools and interfaces that support this?
○ Prioritize documentation
○ Build in elements that encourage reflection
20. ● Inspection tool in CLARIAH Media Suite is first attempt
○ What are other ways to identify and flag issues in data and tools?
○ Input from researchers and developers needed!
● How do/can other often used search interfaces deal with transparency?
○ Nederlab
○ Delpher
○ WorldCat
○ Pica
○ Google
Reflection and Transparency in Tool Interfaces
21. ● List of recommendations (Koolen, van Gorp, van Ossenbruggen - DSH 2018)
● For researchers:
○ Incorporate digital source, data and tool criticism in research process
○ Explicitly ask and answer questions about assumptions, choices, limitations
■ Document and share workarounds
○ Develop method of experimentation with tool to test functioning
○ Document research process
● For tool developers and data providers (and researchers sharing datasets):
○ Add an “About” page and documentation on functionalities
○ Design UIs so as to encourage reflection!
○ Describe selection criteria and transformations of data sets
Recommendations
24. ● Data needs processing to offer insights for research questions
○ Making data transformations offers different perspectives or scopes on data
■ Often left out of publications, outsourced as “technical detail”
■ But details matter, process is intellectual effort!
● Data Scopes concept (Hoekstra and Koolen 2018)
○ Framework for thinking and communicating about research data processing
○ Especially for combining data from different sources
Data Scopes
25. Data Scopes
● Data needs processing to offer insights for research questions
○ Making data transformations offers different perspectives or scopes on data
■ Often left out of publications, outsourced as “technical detail”
■ But details matter, process is intellectual effort!
● Data Scopes concept (Hoekstra and Koolen 2018)
○ Framework for thinking and communicating about research data processing
○ Especially for combining data from different sources
● Five types of transforming activities:
○ Selecting
○ Modeling
○ Normalizing
○ Linking
○ Classifying
● A form of scholarly primitives (Unsworth 2000, Anderson et al. 2010)
26.
27. ● Online book response corpus (Boot 2017)
○ ~400,000 book reviews in Dutch
○ From different review sites (Bol, Hebban, Dizzie, Wat Lees Jij Nu, …)
● Research questions
○ What impact does reading fiction have on readers?
■ How do reviewers describe impact of book?
■ Are there differences across genres/authors?
Use Case 1: Analyzing Online Book Reviews
28. ● Pay ledgers of VOC personnel
○ 774,200 contracts between VOC and individual persons
○ Career transitions: two subsequent contracts of the same person
● Research questions
○ What are typical career paths for VOC personnel?
○ Do migrants have different careers or chances of promotion than non-migrants?
Use Case 2: VOC Maritime Careers
29.
30.
31.
32. Reading Impact
“Je gaat Stijn eigenlijk een beetje begrijpen, …”
“You start to understand Stijn, …”
“Helaas is de schrijfstijl bedroevend, presenteert Kluun zich als een
pseudo-intellectueel …”
“Unfortunately the writing style is pathetic, Kluun presents himself as a pseudo-intellectual …”
33. Reading Impact Rules
● 349 rules
○ Identifying 4 types of impact: general, narrative, style, reflection
● Term: boeiend
○ Rule 2: Style impact: boeiend + style term (“boeiend taalgebruik”)
○ Rule 3: Reflection: boeiend + topic term (“boeiende thematiek”)
● Phrase: in één ruk * uitlezen
○ Rule 79: General impact (“Ik heb het boek in één ruk helemaal uitgelezen.”)
○ Many variants
■ één/een/1
■ adem/avond/dag/keer/middag/ruk/stuk/zucht/...
■ uitlezen/uit
34. ● Gather review data
○ Select review sites
○ Model review from web page (book title, author, ISBN, reviewer, date, rating, website, …)
○ Link author and book title to WorldCat record (for missing data)
■ Select ISBN, publisher, publication year
○ Link ISBN to record in boek.nl database
■ Select genre classification (NUR code)
Data Scopes for Reading Impact Analysis (1/2)
35. Data Scopes for Reading Impact Analysis (2/2)
● Extract impact expressions
○ Select individual sentences from book reviews
○ Normalise words in sentences to their lemmas
○ Select all sentences that match an impact rule
○ Classify sentences by impact rule
● Analyse impact
○ Select impact matches by book genre or author or book ID or reviewer ID
36. ● Extract domain specific sentiment lexicon
○ Compare positive and negative reviews
● Intended selection
○ Positive: 4+5 star reviews
○ Negative: 1+2 star reviews
Intended and Unintended Selection
37. ● Extract domain specific sentiment lexicon
○ Compare positive and negative reviews
● Intended selection
○ Positive: 4+5 star reviews
○ Negative: 1+2 star reviews
● Unintended selection
○ Bol.com and Hebban.nl: all reviews have a star rating
○ Leestafel.info: none of the reviews have a rating
○ Selections exclude all Leestafel.info reviews
○ Consequence: Leestafel.info reviews not represented in sentiment lexicon!
Intended and Unintended Selection
38. Intended and Unintended Selection
● Extract domain specific sentiment lexicon
○ Compare positive and negative reviews
● Intended selection
○ Positive: 4+5 star reviews
○ Negative: 1+2 star reviews
● Unintended selection
○ Bol.com and Hebban.nl: all reviews have a star rating
○ Leestafel.info: none of the reviews have a rating
○ Selections exclude all Leestafel.info reviews
○ Consequence: Leestafel.info reviews not represented in sentiment lexicon!
● Selection choices lead to side effects!
39. Modelling and Consequences
● Review text
○ Extract text only, ignore images and emoji’s
● Book identifier
○ To group reviews of same book
○ ISBN? Not always present, each version has own ISBN
● Dates:
○ Underspecified dates cause undefined behaviour when sorting by date
● Ratings:
○ some review sites have 5-star rating system, some allow half stars
○ mixing different rating systems results in odd distributions
■ How do you compare 4-star reviews to 5-star reviews?
40. Linking
● Add missing data via external sources
○ Missing ISBN, publisher and publication date info
● Add contextual data for interpretation
○ Genre/subject classification
41.
42.
43.
44. Classification
● Reduce complexity by grouping on common characteristics
● Boundaries are often arbitrary
○ NUR code: Genre/subject classification (based on Dewey Decimal Classification)
■ 302 - Translated literary novel
■ 305 - Literary Thriller
■ 331 - Detective
■ 342 - Historical novel
○ Different editions of same book can get different classifications!
45. Maritime Career Data Scopes
● Selection: VOC pay ledgers between 1680 and 1794
○ Some ledgers are missing, before 1680 most are missing
● Modeling: two pay ledgers mention same person if
○ Entries have similar full names
○ Same place of origin
○ Gap between subsequent contracts is less than 6 years
○ The person didn’t die during the first contract period
● Normalizing: modernize and standardize spelling of names and ranks
● Classification: assign entry to person ID, rank to Occupation classification
● Linking: place of origin to Geo database
○ Is place of origin within or outside the Dutch Republic?
○ Because we’re interested in migrant vs. non-migrant
54. ● Many phenomena in data have skewed distribution
○ Few high frequent, many low frequent: long tail
○ Descriptive stats like mean/average are not very useful
● Appear everywhere
○ Maritime data: 197 distinct ranks, top 2 (or 1%) cover ~50% of the data
○ Book reviews: >33,000 authors, top 330 (1%) cover ~37% of the data
Data Distributions
55. Data Distributions - Long Tails and Analysis
● Easy to focus on head of distribution: small set of most frequent
○ But they are not representative, as the vast majority is different
○ But long tail has too many to analyze in detail
● Use classification to group low frequent items
○ Group ranks by type and level: naval/military/craftsmen, first/second/third
○ Group book/authors by genre
○ But usually same problem reappears: few large groups, many small groups
○ Variance within large groups is bigger than between groups
56. Wrap Data Scopes
● This process is not “mere preparation”
○ but part of the “real research”
● Process is complex, takes intellectual effort
○ Requires both technical and domain knowledge and interpretation
○ Different choices can lead to very different analyses and interpretations
○ Break down complexity by engaging with intermediate results
○ Tools should be transparent about transformations, show intermediate results
57. Wrap Data Scopes
● This process is not “mere preparation”
○ but part of the “real research”
● Process is complex, takes intellectual effort
○ Requires both technical and domain knowledge and interpretation
○ Different choices can lead to very different analyses and interpretations
○ Break down complexity by engaging with intermediate results
○ Tools should be transparent about transformations, show intermediate results
● Hidden choices, hidden assumptions
○ Even if you didn’t consider a certain transformation you still made a choice!
○ All transformations you don’t consider explicitly, are implicit decisions, either that they are
irrelevant, or that they shouldn’t be done!
59. Documenting Research Process
● Document process steps
○ Facilitates collaboration, review, reuse
● Research journals
○ Similar to Digital Tool Criticism workshops
● Tools that support process documentation
○ Open Refine (http://openrefine.org/)
■ Interaction history
○ Jupyter notebooks (https://jupyter.org/)
■ Mix program code with narrative and visualizations
● Layered publications
○ Narrative, process, data
○ Data stories: https://stories.triply.cc/netwerk-maritieme-bronnen/
60. Data Has No Memory
● Through linking, our review dataset now has complete set of ISBNs
○ Allows comparing reviews of different editions of a book
○ E.g. does plain edition affect readers differently from critical edition?
● Each edition has own ISBN
○ Modeling: group reviews by ISBN, group ISBNs by title+author or NTSC
61. Data Has No Memory
● Through linking, our review dataset now has complete set of ISBNs
○ Allows comparing reviews of different editions of a book
○ E.g. does plain edition affect readers differently from critical edition?
● Each edition has own ISBN
○ Modeling: group reviews by ISBN, group ISBNs by title+author or NTSC
● Banana peel: we’ve hidden uncertainty!
○ Some reviews don’t specify ISBN (we looked them up separately)
○ So we don’t know which edition is reviewed
○ But transformed dataset implies we do!
● Possible solution: add provenance info on data and process
65. Programming, Documentation and Narrative
● Jupyter notebooks
○ Mixing code (research process) with narrative, analysis and decision making
○ Used in many research disciplines
● Examples
○ https://nbviewer.jupyter.org/github/HoekR/MIGRANT/blob/master/results/exploring_data_integration/notebooks/migratie
_datasets_explorations_part_1.ipynb
○ https://nbviewer.jupyter.org/github/marijnkoolen/digital-history-charter-books/blob/master/Preprocess-OHZ-charter-page
s.ipynb
69. Hobby Horses
Humanities scholar at the CLARIAH Toogdag:
“Our students are too stupid to write queries in a structured query
language!”
70. Wrap Up
● Pragmatic approach to discuss transparency in DH research and infrastructure
○ Digital Tool Criticism: Reflection, checklist + questions
○ Data Scopes: Understanding data transformations in research process
○ Document Research Practices: Data has no memory
● Infrastructure should
○ Invite us to collaborate, experiment, question, reflect
○ Reveal and document transformations
● Workshops to incorporate into methodology (research practice and teaching)
○ Data Scopes 2019 (at HuC, in September)
○ Documenting Research Practices (DH Benelux 2019, in September)
71. ● A lot of this work is a collaboration with:
○ Rik Hoekstra
○ Jasmijn van Gorp
○ Jacco van Ossenbruggen
○ Antske Fokkens
○ Liliana Melgar
○ Peter Boot
○ Ronald Haentjens Dekker
○ Marijke van Faassen
○ Lodewijk Petram
○ Jelle van Lottum
○ Marieke van Erp
○ Adina Nerghes
○ Melvin Webers
Acknowledgements
73. References
Anderson, S., Blanke, T. and Dunn, S., 2010. Methodological commons: arts and humanities e-Science fundamentals. Philosophical Transactions of
the Royal Society A: Mathematical, Physical and Engineering Sciences, 368(1925), pp.3779-3796.
Boot, P. 2017. A Database of Online Book Response and the Nature of the Literary Thriller. In: Digital Humanities 2017, Montreal, Conference
abstracts.
Burke, T. 2011. How I Talk About Searching, Discovery and Research in Courses. May 9, 2011.
Da, N.Z. 2019. The Computational Case against Computational Literary Studies. Critical Inquiry 45:3, pp. 601-639
Drabenstott, K.M., 2001. Web Search Strategy Development. Online, 25(4), pp.18-25.
Fickers, F. 2012. Towards a New Digital Historicism? Doing History in the Age of Abundance. View journal, volume 1 (1).
http://orbilu.uni.lu/bitstream/10993/7615/1/4-4-1-PB.pdf
Hitchcock, T. 2013. Confronting the Digital - Or How Academic History Writing Lost the Plot. Cultural and Social History, Volume 10, Issue 1, pp.
9-23. https://doi.org/10.2752/147800413X13515292098070
Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods: A Journal of Quantitative and Interdisciplinary History,
Volume 51 (2), 2018.
74. Koolen, M., J. van Gorp, J. van Ossenbruggen. 2018. Lessons Learned from a Digital Tool Criticism Workshop. Digital Humanities in the Benelux
2018 Conference.
Putnam L. 2016. The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast. American Historical Review, Volume
121, Number 2, pp. 377-402.
Sloman, S. A. & Fernbach, P. M. (2017). The Knowledge Illusion: Why We Never Think Alone. Riverhead Books: New York.
Unsworth, J., 2000, May. Scholarly primitives: What methods do humanities researchers have in common, and how might our tools reflect this. In
Symposium on Humanities Computing: Formal Methods, Experimental Practice. King’s College, London (Vol. 13, pp. 5-00).
Vakkari, P. 2016. Searching as Learning: A systematization based on literature. Journal of Information Science, 42(1) 2016, pp. 7-18.
Yakel, E., 2010. Searching and seeking in the deep web: Primary sources on the internet. Working in the archives: Practical research methods for
rhetoric and composition, pp.102-118.
References