Be the first to like this
Presentation at IIPC 2016 conference, Reykjavik, Iceland, 14 April 2016. Abstract:
Web archiving institutions have jointly harvested Petabytes of archived web content, in potential an exceptionally rich data source for researchers across the globe. These web archives are multidimensional by nature. First, a temporal dimension arises from different versions of web content accumulated over time. Second, a hierarchical dimension is implied as web archives may be examined at different analytical levels (Brügger, 2010), examples include the level of the web sphere, website and web page.
Scholars often focus their analysis on a specific analytical level and temporal range, for example looking at electoral web spheres at election times (Xenos and Bennet, 2007) or hyperlinking in news websites across time (Karlsson et al, 2015). However, we claim that this scholarly practice is not well supported by current web archive access tools, that usually allow only access at the page level and do not offer insights into the temporal development of broader selections of archived Web content, such as web spheres or websites. Hence, there is a need for more flexible access services in a research context.
In this presentation, we conceptually and practically explore how to address this mismatch. We illustrate how the temporal dimension can be harnessed by aggregating web content using different time ranges and the hierarchical dimension accommodated by novel aggregation support. Utilizing a concrete use case, we illustrate the potential usefulness of these representations of aggregated Web content. We analyze and compare the temporal evolution of various categories of websites in the Dutch Web Archive (such as news, history-related and government websites) across a five-year period. In this analysis, we look at the evolution of textual content, internal structure and image content across categories and websites. Finally, our presentation indicates how these types of aggregated representations may be integrated into future search systems for Web archives.