Towards Research Engines: Supporting Search Stages in Web Archives (2015)

WebART project
Web Archive RetrievalTools
Jaap Kamps, Richard Rogers, Arjen deVries 

Hildelies Balk, RenéVoorburg 	

!
Thaer Samar, Hugo Huurdeman, Sanna Kumpulainen
Flickr: LucViatour
!
Hugo Huurdeman!
University of Amsterdam!
huurdeman@uva.nl!
!
!
!
Towards Research Engines: 

Supporting Search Stages in Web Archives
webarchiving.nl
Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015
Introduction
• Web archives preserve the fast-
changing Web
• By now containing Petabytes of
valuable Web data
!
• This could be a valuable resource,
however, archives have not
frequently been used for research
!
• Several underlying reasons exist.
Here, the focus is on potential
limitations in access
Flickr: laughingsquid
The concept of ‘task-sharing’
• We look at the concept of task-
sharing (Beaulieu, 1999)
!
• i.e. how should we design web
archive access systems to better
facilitate task-sharing between
scholar and system?
!
• Bottom-up approach: looking at
scholars’ use of Web data,

and how currents systems
support scholars’ needs
scholar
research task
system
1 Scholars’ use of web data!
& current support
1.1 Study: scholars’ research phases
• Exploratory analysis of scholars’
research tasks (journal papers)!
• scholars using temporal Web data
!
• Use research phases as a ‘lens’
to analyze these papers
artist:
1.1 Background: Research Phases
• Various scholars have
defined different 

stages occurring in 

research tasks 

(Bronstein ’07; Chu ’99; 

Meho & Tibbo ’03)
!
• Specifically, Brügger 

(2014) has defined several
research phases relevant 

to web archive research:
1. Corpus creation
2. Analysis
3. Dissemination
1.2 Study: scholars’ research phases
• Method:!
• querying EBSCOhost using the CMMC (Communication & Mass
Media Complete), and LISTA (Library, Information Science &
Technology Abstracts) databases
!
• selecting all journal papers (2007-2015) which contain longitudinal
analyses (excluding computer science papers)
1.2 Study: literature corpus overview
• 18 papers (17 distinct first authors)
!
• Main areas:
• Information Science
• Communication
• New Media
• Political Science
1.2 Study: literature corpus overview
• Observation: various ways of
corpus definition, analysis and
dissemination in journal papers
!
• However, most papers in this
literature set did not use Web
archives as a data source
!
• Corresponds to large gap
potential community addressed
by web archives & small group
actually using them thus far
(Dougherty & Meyer, 2014)
1.3.1 Study results: Corpus definition phase
• 1. selecting webpages or
websites, e.g. based on
authoritative lists (13)
!
• 2. querying regular search
engines (5)
!
• 3. taking a sample of
webpages (4)
!
• Often: combination of methods
e.g. the term ‘informetrics’ (Bar-Ilan, 2009), descriptors
of youth movements (Xenos & Bennet, 2007)
e.g. a list of insurance companies (Waite and Harrison,
2007)
e.g. one week per month (Li et al, 2014) ; to reduce
large size of corpus, or data bias (John, 2013)
1.3.1 Study results: Corpus definition phase
Query
Selection
Sample
Query
Selection
Sample
➤
➤
➤
➤
➤
13
5
1
3
4
• Current support:
• Most: Selecting URLs (Wayback Machine)
• Many: Querying the contents of the archive
• Few: Selecting (predefined) categories
• Very few: Sampling contents of the archive
• Current limitations:
• Defining, saving & sharing of corpora
• Document-centric access methods [Hockx-Yu, 14]
• Limitations of search [Ben-David & Huurdeman,14]
1.3.2 Results: Analysis phase (1/2)
• Content analysis (66.7%)!
• manual coding
• coding schemes, at times based
on existing frameworks
!
• Content analysis (22.2%)
• automatic
• existing/customly developed tools
!
• Network analysis (11.1%)!
• issue crawler, link
classifications
1.3.2 Results: Analysis phase (2/2)
• Level of analysis:

(b/o Brügger, 2013)!
!
• page element (4) (22%)
• e.g. mission statements
• web page (6) (33%)
• e.g. blog pages
• web site* (7) (39%)
• e.g. political actors’ sites
• web sphere (1) (6%)
• e.g. youth web sphere
web sphere (1)
website (7)
page element (4)
webpage (8)
• Current support
• Very few: analysis (n-gram,
trends), export options
• Current limitations:
• Generally not applicable to custom corpora
• No ways to define granularity of results
• Often have to resort to script-based analysis tools
• Lack of integrated content analysis, coding support, ..
1.3.2 Support: Analysis phase
1.3.3 Results: Dissemination phase
• Tables (16)
!
• Graphs (10)
!
• Link networks (1)
!
• Model (1)
1.3.3 Support: Dissemination phase
• Current limitations
• Set of visualizations
depends on archive
• Generally not applicable
to user-defined corpora
• Current support
• some visualization options
(n-gram, tag clouds)
1.4 Summary
• Observation: omissions in current
support for corpus creation,
analysis and dissemination in a
research context
!
• Opportunities arise to increase
task-sharing in future systems
scholar
research task
system
2 From Search to Research engines
2.1 Supporting the flow (1/2)
• How to integrate this varied set of features into an
integrated access system?
• with a high usability and without cognitive overload
!
!
!
!
!
!
!
• Traditional approach: “Complex” interface 

integrating all functionality
Search
?
Dunne
Dunne et al, 2012
2.1 Supporting the flow (2/2)
• Our approach: Divide functionality per (research) stage
!
• Inspired by ongoing work on supporting the flow of Web and
book search in multistage interfaces, based on cognitive models
of the search process 

[Huurdeman & Kamps, 2014; Huurdeman, Kamps, Koolen & Kumpulainen, 2015]
Search
Corpus Creation
Search
Visualization
Search
Analysis
2.2 Current research prototypes: b/o Dutch Web archive
• National Library of the
Netherlands (KB) !
!
• Selective Web archive (2007-now)!
• 10+ Terabyte (25,000+ harvests)
!
• Idea: modular system
2.2.1 Supporting research phases: corpus creation
• faceted search
interface
• different modalities to
explore results
• possibility to
• save (complex) 

queries
• save results
• categorize
Search
Corpus Creation
Saved queries
2.2.1 Supporting research phases: corpus creation
• Further customization
’Under the hood’:
define search strategy
• via visual building blocks
• flexibility in defining a
corpus (determine
selection, ranking,
queries, etc)

[De Vries et al, 2010]

see also: spinque.com
Search
Corpus Creation
2.2.2 Supporting research phases: analysis
• Analysis interface !
• edit/annotate
dataset
• search &
browse dataset
• analyze
Search
Analysis
2.2.3 Supporting research phases: dissemination
• Visualization interface!
• based on RAW
(raw.densitydesign.org)
• visualize datasets
(graphs and
visualizations)
Search
Dissemination
2.3 Caveats & discussion
• Looking at access aspects
• not at underlying data & its properties
• next step: contextualizing ‘completeness’ of
results [see Huurdeman, Kamps, Samar, De Vries, Ben-
David & Rogers, 2015]
!
• Slightly utopian vision: not all analysis
can be supported
• generic versus specific approaches
• towards ‘toolmaker’s tools’
!
• Different archives offer different toolsets
• Importance of sharing (open-source) and
collaboration!
2.4 Conclusion
• Exploratory analysis of scholars’
choices related to corpus
definition, analysis and
dissemination!
!
• These choices revealed a number
of limitations of current access
interfaces
!
• Therefore, we propose a more
fluid approach, moving from mere
search to ‘research engines’
Wayback
Machine
Search
engine
‘Research’
engine
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
webarchiving.nl
@webart12
Thanks & Acknowledgements
• The WebART team (’12-’16): 

Jaap Kamps, Richard Rogers, 

Arjen de Vries, Thaer Samar, 

Sanna Kumpulainen; 

and Anat Ben-David.
!
• We gratefully acknowledge the
collaboration with the Dutch Web
Archive of the National Library of the
Netherlands.
!
• This research was supported by the
Netherlands Organization for Scientific
Research (WebART project, NWO
CATCH # 640.005.001).
References
• Beaulieu, M. (2000). Interaction in information searching and retrieval. Journal of Documentation, 56(4), 431–439.
• Ben-David A. & Huurdeman H. (2014). Web Archive Search as Research: Methodological and Theoretical
Implications. Alexandria Journal, Volume 25, No. 1 (2014)
• Bronstein, J. (n.d.). The role of the research phase in information seeking behaviour of Jewish scholars: a
modification of Ellis’s behavioural characteristics. Retrieved April 20, 2015, from http://www.informationr.net/ir/12-3/
paper318.html
• Brügger, N. (2014). Concluding Remarks. International Internet Preservation Consortium General Consortium.
Paris, France. Retrieved from: http://netpreserve.org/sites/default/files/attachments/Brugger.ppt (April 19, 2015)
• Brügger, N. (2013). Historical Network Analysis of the Web. Social Science Computer Review, 31(3), 306–321
• Chu, C. M. (1999). Literary critics at work and their information needs: A research-phases model. Library &
Information Science Research, 21(2), 247–273.
• Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper
collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information
Science and Technology, 63(12), 2351–2369.
• Hockx-Yu, H. (2014). Access and Scholarly Use of Web Archives. Alexandria, 25(1-2), 113–127.
• Huurdeman H., Kamps J., Samar T., de Vries A., Ben-David A., Rogers R. (2015). Finding Pages in the Unarchived
Web. International Journal on Digital Libraries.
• Huurdeman H., Kamps J., Koolen M., Kumpulainen, S. (forthcoming). The Value of Multistage Interfaces for Book
Search. CEUR-WS.
• Huurdeman, H., & Kamps, J. (2014). From Multistage Information-seeking Models to Multistage Search Systems. In
Proceedings of the 5th Information Interaction in Context Symposium (pp. 145–154). New York, NY, USA: ACM.
• Meho, L. I., & Tibbo, H. R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study
revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587.
• Rogers R. (2013). Digital Methods. MIT Press 2013
• de Vries A., Alink W., Cornacchia R. (2010). Search by Strategy. Proc. ESAIR '10
!
Hugo Huurdeman!
University of Amsterdam!
huurdeman@uva.nl!
!
!
!
Towards Research Engines: 

Supporting Search Stages in Web Archives
webarchiving.nl
Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015
1 of 35

More Related Content

Similar to Towards Research Engines: Supporting Search Stages in Web Archives (2015)(20)

A Case Study Of An Open Online CourseA Case Study Of An Open Online Course
A Case Study Of An Open Online Course
Suzan Koseoglu552 views
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
James Howison378 views
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
Diane Rasmussen Pennington170 views
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
National Information Standards Organization (NISO)2.1K views
Curating Humanities Data: Law, technology and realityCurating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and reality
Center for Scholarly Communication & Digital Curation688 views
Anu digital research literaciesAnu digital research literacies
Anu digital research literacies
York University - Osgoode Hall Law School1.9K views

More from TimelessFuture(20)

Outcomes Visual Navigation ProjectOutcomes Visual Navigation Project
Outcomes Visual Navigation Project
TimelessFuture548 views
Webarchief & Wetenschap (Dutch)Webarchief & Wetenschap (Dutch)
Webarchief & Wetenschap (Dutch)
TimelessFuture444 views

Recently uploaded(20)

Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1460 views
Scope of Biochemistry.pptxScope of Biochemistry.pptx
Scope of Biochemistry.pptx
shoba shoba119 views
Class 10 English  lesson plansClass 10 English  lesson plans
Class 10 English lesson plans
TARIQ KHAN189 views
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and Startups
Svetlin Nakov74 views
Azure DevOps Pipeline setup for Mule APIs #36Azure DevOps Pipeline setup for Mule APIs #36
Azure DevOps Pipeline setup for Mule APIs #36
MysoreMuleSoftMeetup84 views
SIMPLE PRESENT TENSE_new.pptxSIMPLE PRESENT TENSE_new.pptx
SIMPLE PRESENT TENSE_new.pptx
nisrinamadani2159 views
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
TARIQ KHAN92 views
ICS3211_lecture 08_2023.pdfICS3211_lecture 08_2023.pdf
ICS3211_lecture 08_2023.pdf
Vanessa Camilleri79 views
STYP infopack.pdfSTYP infopack.pdf
STYP infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego159 views
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptx
DrHafizKosar87 views
STERILITY TEST.pptxSTERILITY TEST.pptx
STERILITY TEST.pptx
Anupkumar Sharma107 views
BYSC infopack.pdfBYSC infopack.pdf
BYSC infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego160 views
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdfCWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
SukhwinderSingh895865480 views
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA190 views

Towards Research Engines: Supporting Search Stages in Web Archives (2015)

  • 1. WebART project Web Archive RetrievalTools Jaap Kamps, Richard Rogers, Arjen deVries 
 Hildelies Balk, RenéVoorburg ! Thaer Samar, Hugo Huurdeman, Sanna Kumpulainen Flickr: LucViatour
  • 2. ! Hugo Huurdeman! University of Amsterdam! huurdeman@uva.nl! ! ! ! Towards Research Engines: 
 Supporting Search Stages in Web Archives webarchiving.nl Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015
  • 3. Introduction • Web archives preserve the fast- changing Web • By now containing Petabytes of valuable Web data ! • This could be a valuable resource, however, archives have not frequently been used for research ! • Several underlying reasons exist. Here, the focus is on potential limitations in access Flickr: laughingsquid
  • 4. The concept of ‘task-sharing’ • We look at the concept of task- sharing (Beaulieu, 1999) ! • i.e. how should we design web archive access systems to better facilitate task-sharing between scholar and system? ! • Bottom-up approach: looking at scholars’ use of Web data,
 and how currents systems support scholars’ needs scholar research task system
  • 5. 1 Scholars’ use of web data! & current support
  • 6. 1.1 Study: scholars’ research phases • Exploratory analysis of scholars’ research tasks (journal papers)! • scholars using temporal Web data ! • Use research phases as a ‘lens’ to analyze these papers artist:
  • 7. 1.1 Background: Research Phases • Various scholars have defined different 
 stages occurring in 
 research tasks 
 (Bronstein ’07; Chu ’99; 
 Meho & Tibbo ’03) ! • Specifically, Brügger 
 (2014) has defined several research phases relevant 
 to web archive research: 1. Corpus creation 2. Analysis 3. Dissemination
  • 8. 1.2 Study: scholars’ research phases • Method:! • querying EBSCOhost using the CMMC (Communication & Mass Media Complete), and LISTA (Library, Information Science & Technology Abstracts) databases ! • selecting all journal papers (2007-2015) which contain longitudinal analyses (excluding computer science papers)
  • 9. 1.2 Study: literature corpus overview • 18 papers (17 distinct first authors) ! • Main areas: • Information Science • Communication • New Media • Political Science
  • 10. 1.2 Study: literature corpus overview • Observation: various ways of corpus definition, analysis and dissemination in journal papers ! • However, most papers in this literature set did not use Web archives as a data source ! • Corresponds to large gap potential community addressed by web archives & small group actually using them thus far (Dougherty & Meyer, 2014)
  • 11. 1.3.1 Study results: Corpus definition phase • 1. selecting webpages or websites, e.g. based on authoritative lists (13) ! • 2. querying regular search engines (5) ! • 3. taking a sample of webpages (4) ! • Often: combination of methods e.g. the term ‘informetrics’ (Bar-Ilan, 2009), descriptors of youth movements (Xenos & Bennet, 2007) e.g. a list of insurance companies (Waite and Harrison, 2007) e.g. one week per month (Li et al, 2014) ; to reduce large size of corpus, or data bias (John, 2013)
  • 12. 1.3.1 Study results: Corpus definition phase Query Selection Sample Query Selection Sample ➤ ➤ ➤ ➤ ➤ 13 5 1 3 4
  • 13. • Current support: • Most: Selecting URLs (Wayback Machine) • Many: Querying the contents of the archive • Few: Selecting (predefined) categories • Very few: Sampling contents of the archive • Current limitations: • Defining, saving & sharing of corpora • Document-centric access methods [Hockx-Yu, 14] • Limitations of search [Ben-David & Huurdeman,14]
  • 14. 1.3.2 Results: Analysis phase (1/2) • Content analysis (66.7%)! • manual coding • coding schemes, at times based on existing frameworks ! • Content analysis (22.2%) • automatic • existing/customly developed tools ! • Network analysis (11.1%)! • issue crawler, link classifications
  • 15. 1.3.2 Results: Analysis phase (2/2) • Level of analysis:
 (b/o Brügger, 2013)! ! • page element (4) (22%) • e.g. mission statements • web page (6) (33%) • e.g. blog pages • web site* (7) (39%) • e.g. political actors’ sites • web sphere (1) (6%) • e.g. youth web sphere web sphere (1) website (7) page element (4) webpage (8)
  • 16. • Current support • Very few: analysis (n-gram, trends), export options • Current limitations: • Generally not applicable to custom corpora • No ways to define granularity of results • Often have to resort to script-based analysis tools • Lack of integrated content analysis, coding support, .. 1.3.2 Support: Analysis phase
  • 17. 1.3.3 Results: Dissemination phase • Tables (16) ! • Graphs (10) ! • Link networks (1) ! • Model (1)
  • 18. 1.3.3 Support: Dissemination phase • Current limitations • Set of visualizations depends on archive • Generally not applicable to user-defined corpora • Current support • some visualization options (n-gram, tag clouds)
  • 19. 1.4 Summary • Observation: omissions in current support for corpus creation, analysis and dissemination in a research context ! • Opportunities arise to increase task-sharing in future systems scholar research task system
  • 20. 2 From Search to Research engines
  • 21. 2.1 Supporting the flow (1/2) • How to integrate this varied set of features into an integrated access system? • with a high usability and without cognitive overload ! ! ! ! ! ! ! • Traditional approach: “Complex” interface 
 integrating all functionality Search ?
  • 23. 2.1 Supporting the flow (2/2) • Our approach: Divide functionality per (research) stage ! • Inspired by ongoing work on supporting the flow of Web and book search in multistage interfaces, based on cognitive models of the search process 
 [Huurdeman & Kamps, 2014; Huurdeman, Kamps, Koolen & Kumpulainen, 2015] Search Corpus Creation Search Visualization Search Analysis
  • 24. 2.2 Current research prototypes: b/o Dutch Web archive • National Library of the Netherlands (KB) ! ! • Selective Web archive (2007-now)! • 10+ Terabyte (25,000+ harvests) ! • Idea: modular system
  • 25. 2.2.1 Supporting research phases: corpus creation • faceted search interface • different modalities to explore results • possibility to • save (complex) 
 queries • save results • categorize Search Corpus Creation Saved queries
  • 26. 2.2.1 Supporting research phases: corpus creation • Further customization ’Under the hood’: define search strategy • via visual building blocks • flexibility in defining a corpus (determine selection, ranking, queries, etc)
 [De Vries et al, 2010]
 see also: spinque.com Search Corpus Creation
  • 27. 2.2.2 Supporting research phases: analysis • Analysis interface ! • edit/annotate dataset • search & browse dataset • analyze Search Analysis
  • 28. 2.2.3 Supporting research phases: dissemination • Visualization interface! • based on RAW (raw.densitydesign.org) • visualize datasets (graphs and visualizations) Search Dissemination
  • 29. 2.3 Caveats & discussion • Looking at access aspects • not at underlying data & its properties • next step: contextualizing ‘completeness’ of results [see Huurdeman, Kamps, Samar, De Vries, Ben- David & Rogers, 2015] ! • Slightly utopian vision: not all analysis can be supported • generic versus specific approaches • towards ‘toolmaker’s tools’ ! • Different archives offer different toolsets • Importance of sharing (open-source) and collaboration!
  • 30. 2.4 Conclusion • Exploratory analysis of scholars’ choices related to corpus definition, analysis and dissemination! ! • These choices revealed a number of limitations of current access interfaces ! • Therefore, we propose a more fluid approach, moving from mere search to ‘research engines’ Wayback Machine Search engine ‘Research’ engine
  • 33. Thanks & Acknowledgements • The WebART team (’12-’16): 
 Jaap Kamps, Richard Rogers, 
 Arjen de Vries, Thaer Samar, 
 Sanna Kumpulainen; 
 and Anat Ben-David. ! • We gratefully acknowledge the collaboration with the Dutch Web Archive of the National Library of the Netherlands. ! • This research was supported by the Netherlands Organization for Scientific Research (WebART project, NWO CATCH # 640.005.001).
  • 34. References • Beaulieu, M. (2000). Interaction in information searching and retrieval. Journal of Documentation, 56(4), 431–439. • Ben-David A. & Huurdeman H. (2014). Web Archive Search as Research: Methodological and Theoretical Implications. Alexandria Journal, Volume 25, No. 1 (2014) • Bronstein, J. (n.d.). The role of the research phase in information seeking behaviour of Jewish scholars: a modification of Ellis’s behavioural characteristics. Retrieved April 20, 2015, from http://www.informationr.net/ir/12-3/ paper318.html • Brügger, N. (2014). Concluding Remarks. International Internet Preservation Consortium General Consortium. Paris, France. Retrieved from: http://netpreserve.org/sites/default/files/attachments/Brugger.ppt (April 19, 2015) • Brügger, N. (2013). Historical Network Analysis of the Web. Social Science Computer Review, 31(3), 306–321 • Chu, C. M. (1999). Literary critics at work and their information needs: A research-phases model. Library & Information Science Research, 21(2), 247–273. • Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science and Technology, 63(12), 2351–2369. • Hockx-Yu, H. (2014). Access and Scholarly Use of Web Archives. Alexandria, 25(1-2), 113–127. • Huurdeman H., Kamps J., Samar T., de Vries A., Ben-David A., Rogers R. (2015). Finding Pages in the Unarchived Web. International Journal on Digital Libraries. • Huurdeman H., Kamps J., Koolen M., Kumpulainen, S. (forthcoming). The Value of Multistage Interfaces for Book Search. CEUR-WS. • Huurdeman, H., & Kamps, J. (2014). From Multistage Information-seeking Models to Multistage Search Systems. In Proceedings of the 5th Information Interaction in Context Symposium (pp. 145–154). New York, NY, USA: ACM. • Meho, L. I., & Tibbo, H. R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587. • Rogers R. (2013). Digital Methods. MIT Press 2013 • de Vries A., Alink W., Cornacchia R. (2010). Search by Strategy. Proc. ESAIR '10
  • 35. ! Hugo Huurdeman! University of Amsterdam! huurdeman@uva.nl! ! ! ! Towards Research Engines: 
 Supporting Search Stages in Web Archives webarchiving.nl Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015