SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
WebART project
Web Archive RetrievalTools
Jaap Kamps, Richard Rogers, Arjen deVries
Paul Doorenbosch, RenéVoorburg,Victor-JanVos
Anat Ben-David, Hugo Huurdeman,Thaer Sammar
Flickr: LucViatour
eHumanities Group,“NewTrends in eHumanities”, Sept. 19 2013, Meertens Institute
2.
WebART project
Web Archive RetrievalTools
Jaap Kamps, Richard Rogers, Arjen deVries
Paul Doorenbosch, RenéVoorburg,Victor-JanVos
Anat Ben-David, Hugo Huurdeman,Thaer Sammar
Flickr: LucViatour
Data Diggin’ @ KB
eHumanities Group,“NewTrends in eHumanities”, Sept. 19 2013, Meertens Institute
3.
Contents
•The WebART project & KB Web archive
•Data Diggin’ @ KB
•Analysis
•DiggingTowards the Future
5.
Thaer Samar
PhD/programmer
Hugo Huurdeman
PhD researcher
Anat Ben-David
Postdoc
Arjen deVries Jaap Kamps Richard Rogers
Paul Doorenbosch
Hildelies Balk
Victor-JanVos
RenéVoorburg
6.
WebART Goals
•Evaluating current curation and selection
procedures of Web archives
•Getting insights into current use of Web
archives
•Developing new methods and tools for
research using Web archives
11.
Data Diggin’ @ KB
•DMI Summer School (2012)
• analysis of selection lists KB
•DMI Winter School (2013)
• use of nu.nl daily harvests KB dataset
•Workshop: Sept ‘11 Day (2013)
• use of full Web archive KB dataset
12.
DMI Summer School (2012)Data digging, part 1
Selection lists KBData:
Toolset: Web-based tools
Flickr: Silvertje
14.
• Digital Methods Winter School (Jan. ’13)
• Co-design workshop (“Living Lab”)
• New Media researchers & developers
• first use WebARTist
Data digging, part II
nu.nl daily harvestsData:
Toolset: Full-text search Web-based tools
30.
•New Media researchers’ interests:
• “derive periodizations of the Web” (Web history)
• “source hierarchy” (dominant sources in archive)
• “keyword uptake” (terms over time)
• e.g.‘geenstijl language in archive’
• “accidental”/“incidental” archiving
• e.g.‘the guilty pleasures of the Web of innocence’
DMI “9/11 Day” (2013)Data digging, part III
35.
Analysis (1)
• studying the ‘archive’ vs. the ‘archived content’
• researchers’ (un)familiarity with temporal (archive)
search
• “conditioned” to Google-style searching
• high demand for export functions and aggregation
features
36.
Analysis (2)
•“data is still a crucial factor”
• quantity & quality: inherent incompleteness &
inconsistencies
• not always clear what’s in & what’s out
• crawl settings (e.g depth), temporal gaps
• “researchers always want what isn’t there”
37.
Digging towards the future
Full KB ArchiveDatasets:
Toolset: “Toolmaker’s tools”
++
38.
A step further...
•Build customizable systems, or,
toolmakers’ tools
•Provide building blocks
39.
A step further...
use “Hadoop” computing power to build custom dataset, perform high-level analysis, etc.
41.
Moving beyond mere “search”
Wayback
Machine
Search
engine
“Research” engine
explicit support for
full research task,
including analysis
and synthesis steps