SlideShare a Scribd company logo
Search in Research, Let’s Make it More Complex!
Collaboratively Looking Under the Hood and Its
Consequences
Marijn Koolen
Humanities Cluster - Royal Netherlands Academy of Arts and Sciences
CLARIAH Media Studies Summer School
Netherlands Institute for Sound and Vision, 3 July 2018
Overview
1. Search in Research
a. Search as part of research process
b. Search vs. other access methods
2. Search, Retrieval and Ranking
a. Retrieval Systems, Ranking Algorithms and Relevance Models
3. Searching in Digital Collections
a. Understanding (digital) collections and their construction
b. Tool analysis through experimentation
4. Search Strategies and Corpus Building
a. Systematic searching
b. Search strategies and sampling
1. Search in Research
● Research Phases
○ Exploration, gathering, analysis, synthesis, presentation
○ Extremely non-linear (affordance of digital realm)
● Search happens throughout research process
○ Search phases: pre-focus, focus, post-focus
○ Use different types of collections and search engines
■ General purpose search engines,
■ Domain- and collection-specific (e.g. GLAMS),
■ Personal/private (offline) collections
○ Search strategies:
■ Ad hoc or systematic: berrypicking (Bates 1989), keyword harvesting (Burke 2011), …
■ Important for data and tool criticism
Research Process
● For many online materials access is limited to search interface
○ Browsing is guided by available structure
■ Drill down via facets
■ Navigate via metadata fields (if enabled)
○ Without (relevant) structure, direct search is only practical alternative
● Searching as exploration
○ How does search engine provide overview?
■ How big is collection?
■ How is collection structure communicated?
■ What (meta)data is available?
■ How are search characteristics explained?
■ How are search results summarised?
Search Engine as Mediator
● Browsing brings you along unintended materials:
○ Navigating your way to relevance
○ Impresses on you what else there is (see also Putnam 2016)
● Keyword search tends to focus on relevance
○ Pushes back related/nearby materials
○ Collection structure can be enabled to allow faceting (overview)
● Search and research methodology
○ Impact of digital keyword search needs to be reflected in methodology
○ How do you account for search process in scholarly communication?
■ Method of citation is based on analogue browse/search in archives and libraries
■ Pre-focus to focus: switch between ad hoc and systematic?
■ Non-linearity: exploration never stops, assumptions constantly challenged
Browsing vs. Keyword Searching
'To take a single example of this disconnect between research process and representation, many of us
use and cite eighteenth and nineteenth-century newspapers as simple hard-copy references without
mention of how we navigated to the specific article, page and issue. In doing so, we actively misrepresent
the limitations within which we are working.' (Hitchcock 2013, 12)
'This is not only about being explicit about our use of keyword searching - it is about moving beyond a
traditional form of scholarship to data modelling and to what Franco Moretti calls “distant reading”.'
(Hitchcock, Confronting the Digital, 2013, p. 19).
Keyword Search and “Confronting the Digital”
Information Search and Seeking
● Search takes place in context
○ Part of seeking, and overall inf. behaviour (Wilson)
○ As inf. behaviour changes (phases), so does seeking
and search behaviour
● Reflection-in-action
○ When and where are choice points?
○ How do search actions relate to strategy and inf.
need?
Digital Tool Criticism
Search and Accountability
● What should scholars account for?
○ Aspects of sources, tools and process
● Digital source criticism
○ How to evaluate digital sources (Fickers 2012)
○ Who made digital source, when, why, what for, how?
● Digital tool criticism
○ How to evaluate impact of digital tools (Koolen et al. 2018)
○ Reflection-in-action, experimentation
● Data Scopes
○ How to communicate research process to others (Hoekstra & Koolen 2018)
○ Discuss process of selection, modelling, normalization, linking, classification
2. Search, Retrieval and Ranking
Anatomy of Retrieval Process
Retrieval - Matching and Similarity
● Matching based on user query
○ Query: free text, controlled facet, example (doc, AV or text)
○ Matching docs returned in certain order (non-matching are not retrieved)
■ How does search engine perform matching (esp. for free text and example)?
■ Potentially many objects match query: does order matter?
● Similarity
○ Degree of matching: some match better than others (notion of similarity)
■ Retrieve most similar documents first (ranking)
○ Similar how? Does interface explain?
● Retrieval and ranking
○ Retrieval: which matching documents are returned to the user as results?
○ Ranking: in which order are the results returned?
Retrieval, Ranking and Relevance
● Retrieval results form a set
○ Can be ordered or unordered (e.g. SQL or SPARQL query)
■ Even unordered sets need to be presented to the user in some order
○ Criteria for ordering: alphabetic, size, recency, popularity (views, likes, citations, links)
■ Ordering re-organizes materials, temporarily disrupts “original” organization
■ Provides different view on materials
● Many systems perform relevance ranking
○ Relevant to who or what?
■ Query: document similarity scores
■ User: e.g. search history, preferences
■ Situation: user, location, time, device, query, work context (page views, annotations)
■ Other aspects: quality, diversity, controversy, polarity, exploration/exploitation, ...
● How does an algorithm understand the notion of relevance?
○ Statistical interpretation:
■ Generally: frequent words carry less signal, look for unexpected stuff
■ Many ways of scoring signal
○ TF-IDF:
■ Term Frequency in document (relevance of term in document)
■ Inverse of Document Frequency in collection (commonness of term across docs)
○ Probabilistic Language Model (PLM):
■ Probability of picking term from document as bag of words (relevance of term in doc)
■ Probability of picking term from collection as bag of words (commonness of term)
○ Many other relevance models, e.g. BM25, DFR, SDM, …
■ Different interpretations of relevance, hence different rankings
Algorithmic Interpretation of Relevance
Ranking Issues
● Document length
○ TF-IDF doesn’t model document length, favours longer documents
○ PLM explicitly normalizes on document length, favours shorter documents
○ Upshot: Delpher API returns short documents first for short queries
● Document priors: are all documents equal or not?
○ Can use document prior probability (independent of query)
○ Can favour documents that are more popular, recent, authoritative, …
○ Can favour documents that are more appropriate for situation (location, time of day, …)
● Problem: how do you know how search engine scores relevance?
○ How much should you know about it?
○ Many GLAM search engines have relatively straightforward relevance models, no doc priors
○ Google uses many hundreds of features for document, query, user and situation
Relevance in Metadata Records
● Relevance ranking of metadata records
○ Metadata records are peculiar textual representations
■ Minimal amount of text, low redundancy
■ Majority of terms occur only once
○ Which part of TF-IDF contributes more to score of metadata record?
○ Which fields are useful/used for matching?
● NISV collection
○ Search engine indexes metadata records
■ Some records have lengthy itemized descriptions, some have not
■ Some have transcripts, some have not
○ Consequences for retrieving? And for ranking?
■ How does search engine handle this?
■ How does search engine communicate this?
● Hard to match keywords against AV signal directly
○ Option: use text representation for AV document
■ E.g. metadata, description, script, speech transcript, ...
○ Option: use AV representation of query
■ E.g. example document or user recording
■ Use audio or visual similarity (again, similar how?)
Retrieving and Ranking Audiovisual Materials
● Experiment to understand search functionalities
○ How can you find out if multiple search terms are treated with Boolean AND or OR operators?
○ How can you find out if terms are stemmed/normalized?
● Phrase search:
○ What happens when you use quotation marks to group terms into a phrase?
○ How do the results compare to those using no quotation marks?
● Proximity search:
○ Can you specify that terms should be near each other?
● Fuzzy search: wildcard and edit distance searches
○ Controlling lexical variation vs. uncontrolled wildcard search
○ voetbal+voetballen vs. voetbal* (matches voetbalvereniging, voetbalveld, ...)
Opaqueness of Interfaces and Experimentation
● Experiment with Search and Compare tools of the CLARIAH Mediasuite
○ Find out if stopwords are removed
○ Find out if words are stemmed/normalized
○ Find out how multi-word queries are interpreted, i.e. as AND or OR
○ Find out how standard search operators work
■ Boolean AND, OR and NOT
■ Quotation marks for phrases
Exercise
3. Searching in Digital Collections
● Collections of GLAMs are often built up over decades
○ Based on aims and selection criteria
■ Rarely "complete", dependent on availability of materials
○ Digital access via digitization, or digital archiving (born-digital)
■ Some things are lost in this process (e.g. context, quality, …)
● Heterogeneity: mix of object/source types (sub-collections)
○ Different modalities, different ways of accessing and presenting
■ Text vs. Image vs. AV vs. 3D (or 4D)
Nature of Digital Collections
Nature of Metadata
● Digital access via metadata
○ Metadata: data about the object/source
○ Types: formal, structural, technical, administrative, aboutness
○ Metadata fields allow selection and search via specific fields
■ Title, description, creator, creation date, genre, …
○ Allows (seemingly) uniform access to heterogeneous collections
■ But, different materials have different aspects to describe
■ Edition is relevant for books and films, not so much for paintings
● Metadata creation process
○ Often done with limited time, information and system flexibility
○ Inherently subjective, especially content analysis
● Size matters
○ Requirements change as size of collection grows (also depends on expectations)
● Hierarchical organization
○ 4 levels
■ Series: De Wereld Draait Door
■ Season: De Wereld Draait Door 2016
■ Program: De Wereld Draait Door 21-06-2016
■ Segment: De Wereld Draait Door 21-06-2016
○ Each level has a metadata record (with overlap in field, e.g. title)
● Follows archival standard
○ Describe aspect at highest relevant level
○ Don’t repeat at lower levels unless it deviates (e.g. main titles)
○ Fonds: aggregation of documents from same origin
Archival Structure and NISV Audiovisual Collection
● Power of the archive
○ Problem of perspective (from archive-as-source to archive-as-subject, Stoler 2002)
● History of the archive
○ Collections created over decades often go through changes in
■ selection criteria, cataloguers (human or algorithm),
■ cataloguing budgets, policies, rules, practice and vocabularies,
■ software (migrations and updates), hardware,
■ institutional mission, societal attitudes, …
○ Most of these aspects remain undocumented or partially documented
● Consequences
○ Almost inherently incomplete, inconsistent and sometimes necessarily incorrect
○ After many years, it's hard to retrace what happened
■ and how it affects access, selection and analysis
Digital Source and Data Criticism
Metadata in theory Metadata in practice
Source: Jaap Kamps
Combined Collections
● Several portals combine (heterogeneous) collections
○ Examples:
■ Europeana, European Newspapers, EU screen, Nederlab, Delpher, Online Archives of
California, …
○ Worldwide aggregated collections:
■ ArchiveGrid (1000+ archives): over 5M finding aids
■ WorldCat (72,000 libraries): 400M records, 2.6B assets, 100M persons
● Huge challenge for source criticism as well as search
○ Collections vary in size, provenance, selection criteria, metadata policies, interpretation and
richness
○ Heterogeneous metadata schemas have been mapped to single schema
■ Causes problems for interpretation
■ E.g. what does creator mean for paintings, films, tv series, letters, advertisements, ...?
Assessing Metadata Quality
● Questions
○ What are pitfalls in relying on metadata?
○ How can we evaluate metadata quality?
○ What are relevant aspects to consider?
● Collection inspection
○ In CLARIAH Media Suite we created a tool for inspecting metadata
■ Esp. useful for complex collections like NISV audiovisual collection
■ Somewhat ad hoc, please feel encouraged to give feedback!
○ Please go to the Media Suite and go to the Collection Inspector tool
■ Click on “select field to analyse” and let the interface load the data on completeness (this
will take awhile)
Assessing Timelines and Other Visualizations
● Timeline visualizations give view of temporal spread
○ Very difficult to interpret properly
● Issues with absolute frequencies:
○ Collection materials not evenly distributed
○ Need to compare query-specific distribution to collection-distribution
● Issues with relative frequencies:
○ Incompleteness not evenly distributed (use collection inspector)
Retrievability and Metadata Characteristics
● Different types of metadata fields
○ Controlled vocabulary: e.g. broadcast channel (radio or tv)
○ Number: number of episodes/seasons/segments
○ Time/date: program length, recording date
○ Free keyword/keyphrase: title, person name (tend to be non-unique)
○ Free text: description, summary, transcript, … (tend to be unique)
● Different types allow different forms of retrieval and ranking
○ Long text fields have more terms, with higher frequencies
■ Some types of programs have longer descriptions/transcript
■ These match more queries, so higher chance of being retrieved
■ Impact of long text fields on ranking depends on relevance model!
○ Repeated values allow aggregation, navigation
● Some search interfaces offer facets to narrow down search results
○ E.g. broadcaster and genre in the CLARIAH Media Suite
○ Facets provide overview, afford focusing through selection
● How do facets work?
○ Based on metadata fields: rich schema has rich options for facets
○ Types of metadata fields: controlled vocab, number, date, keyword/phrase, free text
■ Facets work for field with limited range of values, so not free text fields
○ Long tails in facets: typically, few high frequency, many low frequency values
Metadata and Search Facets
Exercise
● Experiment with the Collection Inspector of the CLARIAH Mediasuite
○ Try out the collection inspector:
■ Scroll through the list of fields to get an idea of what is available
■ Look at completeness of fields for f.i. “genre”, “keywords” and “awards”
■ Which metadata fields are relatively complete?
■ At which archival levels are they most complete?
● Explore which fields are available and which fields make good facets
○ Explore facet distributions in entire collection and for specific queries
4. Search Strategies and Corpus
Building
● Importance of selection criteria
○ Do you have to hand pick each document?
○ Or can you select sets based on matching criteria?
○ Is representativeness important? If so, representativeness of what?
○ Or completeness? Why?
● Exploiting facets and dates
○ Filtering: align facets/dates with research focus
○ Sampling: compare across facets
■ Which facet types can you use?
○ Sampling strategies
■ Sample per facet/year (e.g. X items per facet/year)
■ Within facets, select random or not
Searching for Corpus Building
Tracking Context in Corpus Building
● Why were certain documents selected?
○ How were they selected?
○ What strategy was used?
○ Documenting helps understanding/remembering choices?
● Do research goals and questions change during collection?
○ Interacting with sources during search updates knowledge structures (Vakkari 2016)
○ Updates tend to be small and incremental, hence barely noticeable
○ Explicit reflection-in-action can bring these to the surface (Koolen et al. 2018)
○ Adding annotations can also provide context
Systematic Searching
● Systematic (comprehensive) search has two factors (Yakel 2010):
○ Search strategy (user)
○ Search functionalities (system)
○ Functionalities shape/affect strategy
● Step 1: systematic search for relevant collections online
○ Different collections/sites offer different search functionalities and levels of detail
○ Explicitly address what consequences this has for your strategy and research goals
● Step 2:
○ Explore individual collections using one or more strategies
○ "Researchers need to be flexible and creative to accommodate the vagaries of cataloging
practices." (Yakel 2010, p. 110)
○ Footnote and reference chasing: references often give an "information scent", suggesting
other collections and items to explore.
Search Strategies
● Web search strategies defined by Drabenstott (2001)
○ Discussed in archive context by Yakel (2010)
● Five strategies
○ Synonym generation
○ Chaining
○ Name collection
○ Pearl growing
○ Successive segmentation
● Somewhat related to information seeking patterns by Ellis (1989)
○ Starting, chaining, browsing, differentiating, monitoring, extracting
● Synonym generation: 1) search with relevant term, 2) close read results to
identify related terms (wordclouds, facets), 3) search via related terms for
synonyms.
● Chaining: follow references/citations (explicit or implicit), identify relevant
subset and use explicit structure to explore connected/related subset
● Name collection: search with keywords, identify relevant names, search with
names, identify related names and keywords, repeat. Similar to keyword
harvesting (Burke 2011).
Drabenstott’s Strategies (1/2)
Drabenstott’s Strategies (2/2)
● Pearl growing: start small and focused with specific search terms, slowly
expand out with additional terms to broader topics/themes
● Successive segmentation: opposite of pearl growing; start broad and
increasingly zoom in and focus; e.g. make queries increasingly specific by
adding (ANDing) keywords, replace broad terms with lower frequency terms,
or select facets
Search Strategies and Research Phases
● Research phase
○ Exploration <-> search phase pre-focus
i. Ad hoc, no need yet for systematic search
ii. Mostly pearl growing and/or successive segmentation to determine focus
○ Analysis <-> search phase focus
i. Switch to systematic, determine strategy
ii. Use chaining, name collection, synonym generation (for coverage/representation,
boundaries)
● But reality resists:
○ (Re)search process is very non-linear
○ Boundary between exploration and analysis is not always clear
○ Late discoveries can prompt or force new directions, ...
When To Stop
● Often switch from exploration to “sorta” systematic search
○ But hard to remember and explain what and how you searched
○ Moreover, difficult to determine when to stop
○ Explicit strategy allows for stopping criteria
● Stopping criteria
○ Check whole set/sample, all available facets, ...
○ Diminishing returns: you increasingly encounter seen things, new relevance becomes rare
○ When stopping, make explicit (at least for yourself) when and why you stopped
● Meta-strategy:
○ chance strategy/tactics
○ E.g. successive segmentation -> harvest keywords -> switch segment -> harvest keywords, ...
Wrap Up
● Search in research
○ How to incorporate these processes in research methodology
● Large, heterogeneous collections introduce issues for research
○ Assessing incompleteness of materials
○ Assessing incompleteness, incorrectness and inconsistency of metadata
● Looking under the hood
○ Evaluating information access functionalities (search and browse)
○ Selecting an appropriate search strategy for research goals
○ Determining success/failure of searches
○ Understanding search for corpus building
Burke, T. 2011. How I Talk About Searching, Discovery and Research in Courses. May 9, 2011.
Drabenstott, K.M., 2001. Web Search Strategy Development. Online, 25(4), pp.18-25.
Fickers, F. 2012. Towards a New Digital Historicism? Doing History in the Age of Abundance. View
journal, volume 1 (1). http://orbilu.uni.lu/bitstream/10993/7615/1/4-4-1-PB.pdf
Hitchcock, T. 2013. Confronting the Digital - Or How Academic History Writing Lost the Plot. Cultural and
Social History, Volume 10, Issue 1, pp. 9-23. https://doi.org/10.2752/147800413X13515292098070
Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods: A Journal of
Quantitative and Interdisciplinary History, Volume 51 (2), 2018.
References
References
Koolen, M., J. van Gorp, J. van Ossenbruggen. 2018. Lessons Learned from a Digital Tool Criticism
Workshop. Digital Humanities in the Benelux 2018 Conference.
Putnam L. 2016. The Transnational and the Text-Searchable: Digitized Sources and the Shadows They
Cast. American Historical Review, Volume 121, Number 2, pp. 377-402.
Vakkari, P. 2016. Searching as Learning: A systematization based on literature. Journal of Information
Science, 42(1) 2016, pp. 7-18.
Yakel, E., 2010. Searching and seeking in the deep web: Primary sources on the internet. Working in the
archives: Practical research methods for rhetoric and composition, pp.102-118.

More Related Content

What's hot

Presentation Timo Kouwenhoven FIATIFTA
Presentation Timo Kouwenhoven FIATIFTAPresentation Timo Kouwenhoven FIATIFTA
Presentation Timo Kouwenhoven FIATIFTATimo Kouwenhoven
 
"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading
Shalin Hai-Jew
 
Semantic Search
Semantic SearchSemantic Search
Semantic Search
sssw2012
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
Roi Blanco
 
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Grace Hui Yang
 
Educational Standards Webinar - Sept 2015 - Patricia Payton
Educational Standards Webinar - Sept 2015 - Patricia PaytonEducational Standards Webinar - Sept 2015 - Patricia Payton
Educational Standards Webinar - Sept 2015 - Patricia Payton
BookNet Canada
 
How search engines work Anand Saini
How search engines work Anand SainiHow search engines work Anand Saini
How search engines work Anand SainiDr,Saini Anand
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
Experfy
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices
voginip
 
Information searching & retrieving techniques khalid
Information searching & retrieving techniques khalidInformation searching & retrieving techniques khalid
Information searching & retrieving techniques khalidKhalid Mahmood
 
Capitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataCapitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger Data
Shalin Hai-Jew
 
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Shalin Hai-Jew
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Spatial Decision Support Portal- Presented at AAG 2010
Spatial Decision Support Portal- Presented at AAG 2010Spatial Decision Support Portal- Presented at AAG 2010
Spatial Decision Support Portal- Presented at AAG 2010
Nathan Strout
 

What's hot (14)

Presentation Timo Kouwenhoven FIATIFTA
Presentation Timo Kouwenhoven FIATIFTAPresentation Timo Kouwenhoven FIATIFTA
Presentation Timo Kouwenhoven FIATIFTA
 
"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading"Mass Surveillance" through Distant Reading
"Mass Surveillance" through Distant Reading
 
Semantic Search
Semantic SearchSemantic Search
Semantic Search
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
 
Educational Standards Webinar - Sept 2015 - Patricia Payton
Educational Standards Webinar - Sept 2015 - Patricia PaytonEducational Standards Webinar - Sept 2015 - Patricia Payton
Educational Standards Webinar - Sept 2015 - Patricia Payton
 
How search engines work Anand Saini
How search engines work Anand SainiHow search engines work Anand Saini
How search engines work Anand Saini
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices
 
Information searching & retrieving techniques khalid
Information searching & retrieving techniques khalidInformation searching & retrieving techniques khalid
Information searching & retrieving techniques khalid
 
Capitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger DataCapitalizing on Machine Reading to Engage Bigger Data
Capitalizing on Machine Reading to Engage Bigger Data
 
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Spatial Decision Support Portal- Presented at AAG 2010
Spatial Decision Support Portal- Presented at AAG 2010Spatial Decision Support Portal- Presented at AAG 2010
Spatial Decision Support Portal- Presented at AAG 2010
 

Similar to Search in Research, Let's Make it More Complex!

Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Marijn Koolen
 
A hands-on approach to digital tool criticism: Tools for (self-)reflection
A hands-on approach to digital tool criticism: Tools for (self-)reflectionA hands-on approach to digital tool criticism: Tools for (self-)reflection
A hands-on approach to digital tool criticism: Tools for (self-)reflection
Marijn Koolen
 
Data Scopes - Towards transparent data research in digital humanities (Digita...
Data Scopes - Towards transparent data research in digital humanities (Digita...Data Scopes - Towards transparent data research in digital humanities (Digita...
Data Scopes - Towards transparent data research in digital humanities (Digita...
Marijn Koolen
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
Rebecca Grant
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
National Information Standards Organization (NISO)
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
Toine Bogers
 
Information Retrieval Fundamentals - An introduction
Information Retrieval Fundamentals - An introduction Information Retrieval Fundamentals - An introduction
Information Retrieval Fundamentals - An introduction
Grace Hui Yang
 
Trendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sourcesTrendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sources
Marieke Guy
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
Alex Rayón Jerez
 
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...
Rachel Vacek
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning Analytics
Tore Hoel
 
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
CASRAI
 
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
Karthikeyan Umapathy
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
Nim Dvir
 
QQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILSQQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILSA. M. Kelleher
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
OCLC
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Lynn Connaway
 
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
National Information Standards Organization (NISO)
 
Survey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni ConnawaySurvey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni Connaway
Lynn Connaway
 
Review of search and retrieval strategies
Review of search and retrieval strategiesReview of search and retrieval strategies
Review of search and retrieval strategies
Abid Fakhre Alam
 

Similar to Search in Research, Let's Make it More Complex! (20)

Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
 
A hands-on approach to digital tool criticism: Tools for (self-)reflection
A hands-on approach to digital tool criticism: Tools for (self-)reflectionA hands-on approach to digital tool criticism: Tools for (self-)reflection
A hands-on approach to digital tool criticism: Tools for (self-)reflection
 
Data Scopes - Towards transparent data research in digital humanities (Digita...
Data Scopes - Towards transparent data research in digital humanities (Digita...Data Scopes - Towards transparent data research in digital humanities (Digita...
Data Scopes - Towards transparent data research in digital humanities (Digita...
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
Information Retrieval Fundamentals - An introduction
Information Retrieval Fundamentals - An introduction Information Retrieval Fundamentals - An introduction
Information Retrieval Fundamentals - An introduction
 
Trendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sourcesTrendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sources
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...
Search, Report, Wherever You Are: A Novel Approach to Assessing User Satisfac...
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning Analytics
 
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
 
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
A Research Plan to Study Impact of a Collaborative Web Search Tool on Novice'...
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
 
QQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILSQQML Panel 2014: Pratt Institute SILS
QQML Panel 2014: Pratt Institute SILS
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
 
Survey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni ConnawaySurvey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni Connaway
 
Review of search and retrieval strategies
Review of search and retrieval strategiesReview of search and retrieval strategies
Review of search and retrieval strategies
 

More from Marijn Koolen

Recommender Systems NL Meetup
Recommender Systems NL MeetupRecommender Systems NL Meetup
Recommender Systems NL Meetup
Marijn Koolen
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
Digital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOCDigital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOC
Marijn Koolen
 
Facilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital editionFacilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital edition
Marijn Koolen
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018
Marijn Koolen
 

More from Marijn Koolen (6)

Recommender Systems NL Meetup
Recommender Systems NL MeetupRecommender Systems NL Meetup
Recommender Systems NL Meetup
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
 
Digital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOCDigital History - Maritieme Carrieres bij de VOC
Digital History - Maritieme Carrieres bij de VOC
 
Facilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital editionFacilitating reusable third-party annotations in the digital edition
Facilitating reusable third-party annotations in the digital edition
 
Narrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure NeedsNarrative-Driven Recommendation for Casual Leisure Needs
Narrative-Driven Recommendation for Casual Leisure Needs
 
Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018Scholary Web Annotation - HuC Live 2018
Scholary Web Annotation - HuC Live 2018
 

Recently uploaded

一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 

Search in Research, Let's Make it More Complex!

  • 1. Search in Research, Let’s Make it More Complex! Collaboratively Looking Under the Hood and Its Consequences Marijn Koolen Humanities Cluster - Royal Netherlands Academy of Arts and Sciences CLARIAH Media Studies Summer School Netherlands Institute for Sound and Vision, 3 July 2018
  • 2. Overview 1. Search in Research a. Search as part of research process b. Search vs. other access methods 2. Search, Retrieval and Ranking a. Retrieval Systems, Ranking Algorithms and Relevance Models 3. Searching in Digital Collections a. Understanding (digital) collections and their construction b. Tool analysis through experimentation 4. Search Strategies and Corpus Building a. Systematic searching b. Search strategies and sampling
  • 3. 1. Search in Research
  • 4. ● Research Phases ○ Exploration, gathering, analysis, synthesis, presentation ○ Extremely non-linear (affordance of digital realm) ● Search happens throughout research process ○ Search phases: pre-focus, focus, post-focus ○ Use different types of collections and search engines ■ General purpose search engines, ■ Domain- and collection-specific (e.g. GLAMS), ■ Personal/private (offline) collections ○ Search strategies: ■ Ad hoc or systematic: berrypicking (Bates 1989), keyword harvesting (Burke 2011), … ■ Important for data and tool criticism Research Process
  • 5. ● For many online materials access is limited to search interface ○ Browsing is guided by available structure ■ Drill down via facets ■ Navigate via metadata fields (if enabled) ○ Without (relevant) structure, direct search is only practical alternative ● Searching as exploration ○ How does search engine provide overview? ■ How big is collection? ■ How is collection structure communicated? ■ What (meta)data is available? ■ How are search characteristics explained? ■ How are search results summarised? Search Engine as Mediator
  • 6. ● Browsing brings you along unintended materials: ○ Navigating your way to relevance ○ Impresses on you what else there is (see also Putnam 2016) ● Keyword search tends to focus on relevance ○ Pushes back related/nearby materials ○ Collection structure can be enabled to allow faceting (overview) ● Search and research methodology ○ Impact of digital keyword search needs to be reflected in methodology ○ How do you account for search process in scholarly communication? ■ Method of citation is based on analogue browse/search in archives and libraries ■ Pre-focus to focus: switch between ad hoc and systematic? ■ Non-linearity: exploration never stops, assumptions constantly challenged Browsing vs. Keyword Searching
  • 7. 'To take a single example of this disconnect between research process and representation, many of us use and cite eighteenth and nineteenth-century newspapers as simple hard-copy references without mention of how we navigated to the specific article, page and issue. In doing so, we actively misrepresent the limitations within which we are working.' (Hitchcock 2013, 12) 'This is not only about being explicit about our use of keyword searching - it is about moving beyond a traditional form of scholarship to data modelling and to what Franco Moretti calls “distant reading”.' (Hitchcock, Confronting the Digital, 2013, p. 19). Keyword Search and “Confronting the Digital”
  • 8. Information Search and Seeking ● Search takes place in context ○ Part of seeking, and overall inf. behaviour (Wilson) ○ As inf. behaviour changes (phases), so does seeking and search behaviour ● Reflection-in-action ○ When and where are choice points? ○ How do search actions relate to strategy and inf. need?
  • 10. Search and Accountability ● What should scholars account for? ○ Aspects of sources, tools and process ● Digital source criticism ○ How to evaluate digital sources (Fickers 2012) ○ Who made digital source, when, why, what for, how? ● Digital tool criticism ○ How to evaluate impact of digital tools (Koolen et al. 2018) ○ Reflection-in-action, experimentation ● Data Scopes ○ How to communicate research process to others (Hoekstra & Koolen 2018) ○ Discuss process of selection, modelling, normalization, linking, classification
  • 11. 2. Search, Retrieval and Ranking
  • 13. Retrieval - Matching and Similarity ● Matching based on user query ○ Query: free text, controlled facet, example (doc, AV or text) ○ Matching docs returned in certain order (non-matching are not retrieved) ■ How does search engine perform matching (esp. for free text and example)? ■ Potentially many objects match query: does order matter? ● Similarity ○ Degree of matching: some match better than others (notion of similarity) ■ Retrieve most similar documents first (ranking) ○ Similar how? Does interface explain? ● Retrieval and ranking ○ Retrieval: which matching documents are returned to the user as results? ○ Ranking: in which order are the results returned?
  • 14. Retrieval, Ranking and Relevance ● Retrieval results form a set ○ Can be ordered or unordered (e.g. SQL or SPARQL query) ■ Even unordered sets need to be presented to the user in some order ○ Criteria for ordering: alphabetic, size, recency, popularity (views, likes, citations, links) ■ Ordering re-organizes materials, temporarily disrupts “original” organization ■ Provides different view on materials ● Many systems perform relevance ranking ○ Relevant to who or what? ■ Query: document similarity scores ■ User: e.g. search history, preferences ■ Situation: user, location, time, device, query, work context (page views, annotations) ■ Other aspects: quality, diversity, controversy, polarity, exploration/exploitation, ...
  • 15. ● How does an algorithm understand the notion of relevance? ○ Statistical interpretation: ■ Generally: frequent words carry less signal, look for unexpected stuff ■ Many ways of scoring signal ○ TF-IDF: ■ Term Frequency in document (relevance of term in document) ■ Inverse of Document Frequency in collection (commonness of term across docs) ○ Probabilistic Language Model (PLM): ■ Probability of picking term from document as bag of words (relevance of term in doc) ■ Probability of picking term from collection as bag of words (commonness of term) ○ Many other relevance models, e.g. BM25, DFR, SDM, … ■ Different interpretations of relevance, hence different rankings Algorithmic Interpretation of Relevance
  • 16.
  • 17.
  • 18. Ranking Issues ● Document length ○ TF-IDF doesn’t model document length, favours longer documents ○ PLM explicitly normalizes on document length, favours shorter documents ○ Upshot: Delpher API returns short documents first for short queries ● Document priors: are all documents equal or not? ○ Can use document prior probability (independent of query) ○ Can favour documents that are more popular, recent, authoritative, … ○ Can favour documents that are more appropriate for situation (location, time of day, …) ● Problem: how do you know how search engine scores relevance? ○ How much should you know about it? ○ Many GLAM search engines have relatively straightforward relevance models, no doc priors ○ Google uses many hundreds of features for document, query, user and situation
  • 19. Relevance in Metadata Records ● Relevance ranking of metadata records ○ Metadata records are peculiar textual representations ■ Minimal amount of text, low redundancy ■ Majority of terms occur only once ○ Which part of TF-IDF contributes more to score of metadata record? ○ Which fields are useful/used for matching? ● NISV collection ○ Search engine indexes metadata records ■ Some records have lengthy itemized descriptions, some have not ■ Some have transcripts, some have not ○ Consequences for retrieving? And for ranking? ■ How does search engine handle this? ■ How does search engine communicate this?
  • 20. ● Hard to match keywords against AV signal directly ○ Option: use text representation for AV document ■ E.g. metadata, description, script, speech transcript, ... ○ Option: use AV representation of query ■ E.g. example document or user recording ■ Use audio or visual similarity (again, similar how?) Retrieving and Ranking Audiovisual Materials
  • 21. ● Experiment to understand search functionalities ○ How can you find out if multiple search terms are treated with Boolean AND or OR operators? ○ How can you find out if terms are stemmed/normalized? ● Phrase search: ○ What happens when you use quotation marks to group terms into a phrase? ○ How do the results compare to those using no quotation marks? ● Proximity search: ○ Can you specify that terms should be near each other? ● Fuzzy search: wildcard and edit distance searches ○ Controlling lexical variation vs. uncontrolled wildcard search ○ voetbal+voetballen vs. voetbal* (matches voetbalvereniging, voetbalveld, ...) Opaqueness of Interfaces and Experimentation
  • 22. ● Experiment with Search and Compare tools of the CLARIAH Mediasuite ○ Find out if stopwords are removed ○ Find out if words are stemmed/normalized ○ Find out how multi-word queries are interpreted, i.e. as AND or OR ○ Find out how standard search operators work ■ Boolean AND, OR and NOT ■ Quotation marks for phrases Exercise
  • 23. 3. Searching in Digital Collections
  • 24. ● Collections of GLAMs are often built up over decades ○ Based on aims and selection criteria ■ Rarely "complete", dependent on availability of materials ○ Digital access via digitization, or digital archiving (born-digital) ■ Some things are lost in this process (e.g. context, quality, …) ● Heterogeneity: mix of object/source types (sub-collections) ○ Different modalities, different ways of accessing and presenting ■ Text vs. Image vs. AV vs. 3D (or 4D) Nature of Digital Collections
  • 25. Nature of Metadata ● Digital access via metadata ○ Metadata: data about the object/source ○ Types: formal, structural, technical, administrative, aboutness ○ Metadata fields allow selection and search via specific fields ■ Title, description, creator, creation date, genre, … ○ Allows (seemingly) uniform access to heterogeneous collections ■ But, different materials have different aspects to describe ■ Edition is relevant for books and films, not so much for paintings ● Metadata creation process ○ Often done with limited time, information and system flexibility ○ Inherently subjective, especially content analysis ● Size matters ○ Requirements change as size of collection grows (also depends on expectations)
  • 26. ● Hierarchical organization ○ 4 levels ■ Series: De Wereld Draait Door ■ Season: De Wereld Draait Door 2016 ■ Program: De Wereld Draait Door 21-06-2016 ■ Segment: De Wereld Draait Door 21-06-2016 ○ Each level has a metadata record (with overlap in field, e.g. title) ● Follows archival standard ○ Describe aspect at highest relevant level ○ Don’t repeat at lower levels unless it deviates (e.g. main titles) ○ Fonds: aggregation of documents from same origin Archival Structure and NISV Audiovisual Collection
  • 27. ● Power of the archive ○ Problem of perspective (from archive-as-source to archive-as-subject, Stoler 2002) ● History of the archive ○ Collections created over decades often go through changes in ■ selection criteria, cataloguers (human or algorithm), ■ cataloguing budgets, policies, rules, practice and vocabularies, ■ software (migrations and updates), hardware, ■ institutional mission, societal attitudes, … ○ Most of these aspects remain undocumented or partially documented ● Consequences ○ Almost inherently incomplete, inconsistent and sometimes necessarily incorrect ○ After many years, it's hard to retrace what happened ■ and how it affects access, selection and analysis Digital Source and Data Criticism
  • 28. Metadata in theory Metadata in practice Source: Jaap Kamps
  • 29. Combined Collections ● Several portals combine (heterogeneous) collections ○ Examples: ■ Europeana, European Newspapers, EU screen, Nederlab, Delpher, Online Archives of California, … ○ Worldwide aggregated collections: ■ ArchiveGrid (1000+ archives): over 5M finding aids ■ WorldCat (72,000 libraries): 400M records, 2.6B assets, 100M persons ● Huge challenge for source criticism as well as search ○ Collections vary in size, provenance, selection criteria, metadata policies, interpretation and richness ○ Heterogeneous metadata schemas have been mapped to single schema ■ Causes problems for interpretation ■ E.g. what does creator mean for paintings, films, tv series, letters, advertisements, ...?
  • 30. Assessing Metadata Quality ● Questions ○ What are pitfalls in relying on metadata? ○ How can we evaluate metadata quality? ○ What are relevant aspects to consider? ● Collection inspection ○ In CLARIAH Media Suite we created a tool for inspecting metadata ■ Esp. useful for complex collections like NISV audiovisual collection ■ Somewhat ad hoc, please feel encouraged to give feedback! ○ Please go to the Media Suite and go to the Collection Inspector tool ■ Click on “select field to analyse” and let the interface load the data on completeness (this will take awhile)
  • 31.
  • 32.
  • 33.
  • 34. Assessing Timelines and Other Visualizations ● Timeline visualizations give view of temporal spread ○ Very difficult to interpret properly ● Issues with absolute frequencies: ○ Collection materials not evenly distributed ○ Need to compare query-specific distribution to collection-distribution ● Issues with relative frequencies: ○ Incompleteness not evenly distributed (use collection inspector)
  • 35. Retrievability and Metadata Characteristics ● Different types of metadata fields ○ Controlled vocabulary: e.g. broadcast channel (radio or tv) ○ Number: number of episodes/seasons/segments ○ Time/date: program length, recording date ○ Free keyword/keyphrase: title, person name (tend to be non-unique) ○ Free text: description, summary, transcript, … (tend to be unique) ● Different types allow different forms of retrieval and ranking ○ Long text fields have more terms, with higher frequencies ■ Some types of programs have longer descriptions/transcript ■ These match more queries, so higher chance of being retrieved ■ Impact of long text fields on ranking depends on relevance model! ○ Repeated values allow aggregation, navigation
  • 36. ● Some search interfaces offer facets to narrow down search results ○ E.g. broadcaster and genre in the CLARIAH Media Suite ○ Facets provide overview, afford focusing through selection ● How do facets work? ○ Based on metadata fields: rich schema has rich options for facets ○ Types of metadata fields: controlled vocab, number, date, keyword/phrase, free text ■ Facets work for field with limited range of values, so not free text fields ○ Long tails in facets: typically, few high frequency, many low frequency values Metadata and Search Facets
  • 37.
  • 38. Exercise ● Experiment with the Collection Inspector of the CLARIAH Mediasuite ○ Try out the collection inspector: ■ Scroll through the list of fields to get an idea of what is available ■ Look at completeness of fields for f.i. “genre”, “keywords” and “awards” ■ Which metadata fields are relatively complete? ■ At which archival levels are they most complete? ● Explore which fields are available and which fields make good facets ○ Explore facet distributions in entire collection and for specific queries
  • 39. 4. Search Strategies and Corpus Building
  • 40. ● Importance of selection criteria ○ Do you have to hand pick each document? ○ Or can you select sets based on matching criteria? ○ Is representativeness important? If so, representativeness of what? ○ Or completeness? Why? ● Exploiting facets and dates ○ Filtering: align facets/dates with research focus ○ Sampling: compare across facets ■ Which facet types can you use? ○ Sampling strategies ■ Sample per facet/year (e.g. X items per facet/year) ■ Within facets, select random or not Searching for Corpus Building
  • 41. Tracking Context in Corpus Building ● Why were certain documents selected? ○ How were they selected? ○ What strategy was used? ○ Documenting helps understanding/remembering choices? ● Do research goals and questions change during collection? ○ Interacting with sources during search updates knowledge structures (Vakkari 2016) ○ Updates tend to be small and incremental, hence barely noticeable ○ Explicit reflection-in-action can bring these to the surface (Koolen et al. 2018) ○ Adding annotations can also provide context
  • 42. Systematic Searching ● Systematic (comprehensive) search has two factors (Yakel 2010): ○ Search strategy (user) ○ Search functionalities (system) ○ Functionalities shape/affect strategy ● Step 1: systematic search for relevant collections online ○ Different collections/sites offer different search functionalities and levels of detail ○ Explicitly address what consequences this has for your strategy and research goals ● Step 2: ○ Explore individual collections using one or more strategies ○ "Researchers need to be flexible and creative to accommodate the vagaries of cataloging practices." (Yakel 2010, p. 110) ○ Footnote and reference chasing: references often give an "information scent", suggesting other collections and items to explore.
  • 43. Search Strategies ● Web search strategies defined by Drabenstott (2001) ○ Discussed in archive context by Yakel (2010) ● Five strategies ○ Synonym generation ○ Chaining ○ Name collection ○ Pearl growing ○ Successive segmentation ● Somewhat related to information seeking patterns by Ellis (1989) ○ Starting, chaining, browsing, differentiating, monitoring, extracting
  • 44. ● Synonym generation: 1) search with relevant term, 2) close read results to identify related terms (wordclouds, facets), 3) search via related terms for synonyms. ● Chaining: follow references/citations (explicit or implicit), identify relevant subset and use explicit structure to explore connected/related subset ● Name collection: search with keywords, identify relevant names, search with names, identify related names and keywords, repeat. Similar to keyword harvesting (Burke 2011). Drabenstott’s Strategies (1/2)
  • 45. Drabenstott’s Strategies (2/2) ● Pearl growing: start small and focused with specific search terms, slowly expand out with additional terms to broader topics/themes ● Successive segmentation: opposite of pearl growing; start broad and increasingly zoom in and focus; e.g. make queries increasingly specific by adding (ANDing) keywords, replace broad terms with lower frequency terms, or select facets
  • 46. Search Strategies and Research Phases ● Research phase ○ Exploration <-> search phase pre-focus i. Ad hoc, no need yet for systematic search ii. Mostly pearl growing and/or successive segmentation to determine focus ○ Analysis <-> search phase focus i. Switch to systematic, determine strategy ii. Use chaining, name collection, synonym generation (for coverage/representation, boundaries) ● But reality resists: ○ (Re)search process is very non-linear ○ Boundary between exploration and analysis is not always clear ○ Late discoveries can prompt or force new directions, ...
  • 47. When To Stop ● Often switch from exploration to “sorta” systematic search ○ But hard to remember and explain what and how you searched ○ Moreover, difficult to determine when to stop ○ Explicit strategy allows for stopping criteria ● Stopping criteria ○ Check whole set/sample, all available facets, ... ○ Diminishing returns: you increasingly encounter seen things, new relevance becomes rare ○ When stopping, make explicit (at least for yourself) when and why you stopped ● Meta-strategy: ○ chance strategy/tactics ○ E.g. successive segmentation -> harvest keywords -> switch segment -> harvest keywords, ...
  • 48. Wrap Up ● Search in research ○ How to incorporate these processes in research methodology ● Large, heterogeneous collections introduce issues for research ○ Assessing incompleteness of materials ○ Assessing incompleteness, incorrectness and inconsistency of metadata ● Looking under the hood ○ Evaluating information access functionalities (search and browse) ○ Selecting an appropriate search strategy for research goals ○ Determining success/failure of searches ○ Understanding search for corpus building
  • 49. Burke, T. 2011. How I Talk About Searching, Discovery and Research in Courses. May 9, 2011. Drabenstott, K.M., 2001. Web Search Strategy Development. Online, 25(4), pp.18-25. Fickers, F. 2012. Towards a New Digital Historicism? Doing History in the Age of Abundance. View journal, volume 1 (1). http://orbilu.uni.lu/bitstream/10993/7615/1/4-4-1-PB.pdf Hitchcock, T. 2013. Confronting the Digital - Or How Academic History Writing Lost the Plot. Cultural and Social History, Volume 10, Issue 1, pp. 9-23. https://doi.org/10.2752/147800413X13515292098070 Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods: A Journal of Quantitative and Interdisciplinary History, Volume 51 (2), 2018. References
  • 50. References Koolen, M., J. van Gorp, J. van Ossenbruggen. 2018. Lessons Learned from a Digital Tool Criticism Workshop. Digital Humanities in the Benelux 2018 Conference. Putnam L. 2016. The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast. American Historical Review, Volume 121, Number 2, pp. 377-402. Vakkari, P. 2016. Searching as Learning: A systematization based on literature. Journal of Information Science, 42(1) 2016, pp. 7-18. Yakel, E., 2010. Searching and seeking in the deep web: Primary sources on the internet. Working in the archives: Practical research methods for rhetoric and composition, pp.102-118.