WebART project
Web Archive RetrievalTools
Jaap Kamps, Richard Rogers, Arjen deVries
Paul Doorenbosch, RenéVoorburg,Victor-JanVos
Anat Ben-David, Hugo Huurdeman,Thaer Sammar
Flickr: LucViatour
eHumanities Group,“NewTrends in eHumanities”, Sept. 19 2013, Meertens Institute
WebART project
Web Archive RetrievalTools
Jaap Kamps, Richard Rogers, Arjen deVries
Paul Doorenbosch, RenéVoorburg,Victor-JanVos
Anat Ben-David, Hugo Huurdeman,Thaer Sammar
Flickr: LucViatour
Data Diggin’ @ KB
eHumanities Group,“NewTrends in eHumanities”, Sept. 19 2013, Meertens Institute
WebART Goals
•Evaluating current curation and selection
procedures of Web archives
•Getting insights into current use of Web
archives
•Developing new methods and tools for
research using Web archives
Data Diggin’ @ KB
•DMI Summer School (2012)
• analysis of selection lists KB
•DMI Winter School (2013)
• use of nu.nl daily harvests KB dataset
•Workshop: Sept ‘11 Day (2013)
• use of full Web archive KB dataset
DMI Summer School (2012)Data digging, part 1
Selection lists KBData:
Toolset: Web-based tools
Flickr: Silvertje
• Digital Methods Winter School (Jan. ’13)
• Co-design workshop (“Living Lab”)
• New Media researchers & developers
• first use WebARTist
Data digging, part II
nu.nl daily harvestsData:
Toolset: Full-text search Web-based tools
•New Media researchers’ interests:
• “derive periodizations of the Web” (Web history)
• “source hierarchy” (dominant sources in archive)
• “keyword uptake” (terms over time)
• e.g.‘geenstijl language in archive’
• “accidental”/“incidental” archiving
• e.g.‘the guilty pleasures of the Web of innocence’
DMI “9/11 Day” (2013)Data digging, part III
Analysis (1)
• studying the ‘archive’ vs. the ‘archived content’
• researchers’ (un)familiarity with temporal (archive)
search
• “conditioned” to Google-style searching
• high demand for export functions and aggregation
features
Analysis (2)
•“data is still a crucial factor”
• quantity & quality: inherent incompleteness &
inconsistencies
• not always clear what’s in & what’s out
• crawl settings (e.g depth), temporal gaps
• “researchers always want what isn’t there”
Digging towards the future
Full KB ArchiveDatasets:
Toolset: “Toolmaker’s tools”
++
Moving beyond mere “search”
Wayback
Machine
Search
engine
“Research” engine
explicit support for
full research task,
including analysis
and synthesis steps