This document proposes a framework for aggregating private and public web archives. It introduces two new entities: the Memento Meta Aggregator (MMA) and the Private Web Archive Adapter (PWAA). The MMA allows for dynamic inclusion of archives and recursive construction of archive sets. The PWAA regulates access to private web archives by authenticating requests and relaying results. This framework enables private archives to be included in aggregations while preserving privacy through access control and authentication via the PWAA.
This document introduces the Archival Acid Test, which evaluates how well web archiving tools archive modern webpages that use advanced HTML, JavaScript, and other web technologies. The test is divided into basic tests, JavaScript tests, and advanced features tests to assess different areas. Results show that archiving tools perform well on basic tests but struggle with dynamic content, asynchronous JavaScript, iframes, and other complex features. The goal of the Archival Acid Test is to create a standardized, publicly available way to evaluate how completely archiving tools archive modern webpages and identify areas for improvement.
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryMat Kelly
Â
The document proposes using ResourceSync, BitTorrent, and WebRTC to facilitate the a posteriori replication of satellite imagery published on NASA web servers. It describes using a crawler to discover imagery resources and produce metadata, which is then used by adapter software to invoke a BitTorrent-based distribution of image payloads to users. The approach was constructed as a proof-of-concept to distribute data and mitigate reliance on NASA servers as the single source. Evaluation showed it was effective but temporally expensive, and future work could better integrate ResourceSync and utilize the YAML metadata.
Presented by Michele C. Weigle, June 4, 2015
Columbia University Web Archiving Collaboration: New Tools and Models
Work by Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
Digital Collection Management with CONTENTdm and OmekaGena Chattin
Â
This document compares two digital content management systems: CONTENTdm and Omeka. Both systems have costs and require backup and preservation. CONTENTdm is hosted, while Omeka can be hosted or run in-house. CONTENTdm has content and design limits as a hosted system. Omeka has more flexibility but requires local maintenance. The document provides detailed comparisons of features and functionality between the two systems. It concludes that the best system depends on an organization's storage needs, customization requirements, data standards, resources, and budget.
Jabes 2008 - ConfĂŠrence inaugurale, la grande rĂŠvĂŠlation : penser les ressour...ABES
Â
Jabes 2008 - confÊrence inaugurale, la grande rÊvÊlation : penser les ressources de la bibliothèque à l'Êchelle du web - Lorcan Dempsey, dans le cadre des JournÊes Abes 2008
This document provides an overview of the ResourceSync framework for synchronizing web resources between a source and destinations. It describes the key capabilities a source can provide, including describing available content through resource lists and dumps, describing changes through change lists and dumps, and archiving capability documents. Destinations need baseline and incremental synchronization, and the ability to audit synchronization status. Use cases demonstrate the need for high-volume, low-latency synchronization between sources like arXiv and DBpedia. The framework supports modular capabilities that destinations can use selectively for efficient synchronization aligned with web standards.
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
Â
This document provides an overview of ResourceSync, which is a framework for synchronizing web resources between systems. Some key points:
- ResourceSync was created to address limitations of existing protocols like OAI-PMH by allowing synchronization of any web resource and enabling both one-time and ongoing synchronization.
- It supports various capabilities for synchronization like resource lists, change lists, and notifications. These can be used for initial synchronization or incremental updates.
- Real-world examples are described where ResourceSync has been implemented for projects involving aggregation of digital collections, like Europeana and CLARIAH. It facilitates synchronization between diverse data sources.
- Presentations were given on how ResourceSync could also be useful
Preservation as a Process MetaArchive and Distributed Digital PreservationEducopia
Â
This document provides an overview of MetaArchive, a distributed digital preservation cooperative. It discusses MetaArchive's history and practices, including that it was founded in 2004, aims to prevent data loss through distributing copies across multiple institutions, and involves members maintaining control over their own content. The document outlines MetaArchive's membership, which involves annual fees and responsibilities like hosting a cache server. It also reviews MetaArchive's ingest process, which includes preparing content, developing collection plugins, and testing before the collection is replicated across the network.
This document introduces the Archival Acid Test, which evaluates how well web archiving tools archive modern webpages that use advanced HTML, JavaScript, and other web technologies. The test is divided into basic tests, JavaScript tests, and advanced features tests to assess different areas. Results show that archiving tools perform well on basic tests but struggle with dynamic content, asynchronous JavaScript, iframes, and other complex features. The goal of the Archival Acid Test is to create a standardized, publicly available way to evaluate how completely archiving tools archive modern webpages and identify areas for improvement.
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryMat Kelly
Â
The document proposes using ResourceSync, BitTorrent, and WebRTC to facilitate the a posteriori replication of satellite imagery published on NASA web servers. It describes using a crawler to discover imagery resources and produce metadata, which is then used by adapter software to invoke a BitTorrent-based distribution of image payloads to users. The approach was constructed as a proof-of-concept to distribute data and mitigate reliance on NASA servers as the single source. Evaluation showed it was effective but temporally expensive, and future work could better integrate ResourceSync and utilize the YAML metadata.
Presented by Michele C. Weigle, June 4, 2015
Columbia University Web Archiving Collaboration: New Tools and Models
Work by Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
Digital Collection Management with CONTENTdm and OmekaGena Chattin
Â
This document compares two digital content management systems: CONTENTdm and Omeka. Both systems have costs and require backup and preservation. CONTENTdm is hosted, while Omeka can be hosted or run in-house. CONTENTdm has content and design limits as a hosted system. Omeka has more flexibility but requires local maintenance. The document provides detailed comparisons of features and functionality between the two systems. It concludes that the best system depends on an organization's storage needs, customization requirements, data standards, resources, and budget.
Jabes 2008 - ConfĂŠrence inaugurale, la grande rĂŠvĂŠlation : penser les ressour...ABES
Â
Jabes 2008 - confÊrence inaugurale, la grande rÊvÊlation : penser les ressources de la bibliothèque à l'Êchelle du web - Lorcan Dempsey, dans le cadre des JournÊes Abes 2008
This document provides an overview of the ResourceSync framework for synchronizing web resources between a source and destinations. It describes the key capabilities a source can provide, including describing available content through resource lists and dumps, describing changes through change lists and dumps, and archiving capability documents. Destinations need baseline and incremental synchronization, and the ability to audit synchronization status. Use cases demonstrate the need for high-volume, low-latency synchronization between sources like arXiv and DBpedia. The framework supports modular capabilities that destinations can use selectively for efficient synchronization aligned with web standards.
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
Â
This document provides an overview of ResourceSync, which is a framework for synchronizing web resources between systems. Some key points:
- ResourceSync was created to address limitations of existing protocols like OAI-PMH by allowing synchronization of any web resource and enabling both one-time and ongoing synchronization.
- It supports various capabilities for synchronization like resource lists, change lists, and notifications. These can be used for initial synchronization or incremental updates.
- Real-world examples are described where ResourceSync has been implemented for projects involving aggregation of digital collections, like Europeana and CLARIAH. It facilitates synchronization between diverse data sources.
- Presentations were given on how ResourceSync could also be useful
Preservation as a Process MetaArchive and Distributed Digital PreservationEducopia
Â
This document provides an overview of MetaArchive, a distributed digital preservation cooperative. It discusses MetaArchive's history and practices, including that it was founded in 2004, aims to prevent data loss through distributing copies across multiple institutions, and involves members maintaining control over their own content. The document outlines MetaArchive's membership, which involves annual fees and responsibilities like hosting a cache server. It also reviews MetaArchive's ingest process, which includes preparing content, developing collection plugins, and testing before the collection is replicated across the network.
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
Â
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
This document provides an overview of content management systems (CMS) and their use for digital humanities projects. It discusses what a CMS is, popular open-source CMS platforms like WordPress and Omeka, and how to set up and customize WordPress and Omeka sites. The workshop aims to help participants understand the functionality of CMS platforms and how to choose one suitable for their project needs. The agenda includes hands-on exercises for configuring WordPress and Omeka sites.
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMagnus Pfeffer
Â
Tutorial held at the Semantic Web in Libraries conference in Hamburg, Germany, at November 25th 2013. The tutorial was held together with Kai Eckert, who did Part 1.
Abstract:
When metadata is distributed, combined, and enriched as Linked Data, the tracking of its provenance becomes a hard issue. Using data encumbered with licenses that require attribution of authorship may eventually become impracticable as more and more data sets are aggregated - one of the main motivations for the call to open data under permissive licenses like CC0. Nonetheless, there are important scenarios where keeping track of provenance information becomes a necessity. A typical example is the enrichment of existing data with automatically obtained data, for instance as a result of automatic indexing. Ideally, the origins, conditions, rules and other means of production of every statement are known and can be used to put it into the right context.
Part 1 - Metadata Provenance in RDF: In RDF, the mere representation of provenance - i.e., statements about statements - is challenging. We explore the possibilities, from the unloved reification and other proposed alternative Linked Data practices through to named graphs and recent developments regarding the upcoming next version of RDF.
Part 2 - Interoperable Metadata Provenance: As with metadata itself, common vocabularies and data models are needed to express basic provenance information in an interoperable fashion. We investigate the PROV model that is currently developed by the W3C Provenance Working Group and compare it to Dublin Core as a representative of a flat, descriptive metadata schema.
We actively encourage participants to present their own use cases and open challenges at this workshop. Please contact the organizers for details.
Prior experience: The workshop is intended for participants who have mastered the basics of linked data and want to delve into expressing provenance. Beside a basic understanding of RDF, the linked data principles and the use of ontologies (like Dublin Core or Bibo) to express bibliographic metadata no specialised knowledge is required.
Slides for a presentation made at the Archives Association of British Columbia's 2016 Annual Conference, April 15, 2016, held in Vancouver, BC, Canada.
The slides aim to provide users with a basic introduction to some of the key considerations when implementing a digital preservation plan, describing the workflow with a series of cooking-related references.
"Web Archive services framework for tighter integration between the past and ...Ahmed AlSum
Â
This document describes Ahmed AlSum's PhD defense from February 2014 at Old Dominion University. It discusses his proposal for a Web Archive Services Framework to provide tighter integration between past and present web content. The framework includes several proposed services - a Content Service to access archived web pages, a Metadata Service to retrieve metadata like page titles and thumbnails, a URI Service to handle URI lookups across archives using HTTP redirection, and an overarching Archive Service. The goal is to develop standardized APIs and services to make archived web content more programmatically accessible and help researchers analyze trends over time.
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
This document summarizes Claire Knowles' presentation on updates from the Open Repositories 2014 conference regarding DSpace. The conference had over 460 attendees from 38 countries discussing repository topics. DSpace version 4 was recently released with new features, and version 5 is planned for late 2014 focusing on ORCID support, metadata for all objects, and streaming audio/visuals. Jisc is working on a repository shared services project to integrate key repository services and support open access.
This document provides an overview of curation and the Omeka content management system. It discusses how curation involves collecting, organizing and displaying information. Omeka is introduced as a platform developed by the Center for History and New Media to publish digital collections and exhibitions. The document reviews Omeka's core features and functionality, provides examples of how it can be used for education, and gives a brief introduction to Dublin Core metadata standards for cataloging digital objects.
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
Â
The document discusses research that revisits the graph structure of the web using a new large crawl from Common Crawl. It finds that the web has become more dense and connected over time, with the largest strongly connected component growing significantly. While previous research found power laws for in- and out-degrees, this data does not fit power laws and instead has heavy-tailed distributions. The shape of the bow-tie structure also depends on the specific crawl used. The authors provide the new crawl data and analysis to enable further research on the evolving structure of the web graph.
Talk at the 3rd DBpedia Community Meeting in Dublin about the integration of the Web ProtÊgÊ ontology editor into DBpedia by the Corporate Semantic Web group at Freie Universität Berlin.
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Â
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...kramsey
Â
The document summarizes the report from the Library of Congress Working Group on the Future of Bibliographic Control. The report recommends that bibliographic control become more collaborative, decentralized, web-based, and international in scope. It suggests making efficiency improvements, enhancing access to special collections, positioning technology and the community for the future, and strengthening the library and information science profession. Key themes are economics, standards, cooperation, users, and research. The LC plans to analyze the recommendations and work with the library community to respond and implement changes over time.
Presented at the International Internet Preservation Consortium (IIPC) Web Archiving Week, University of London, 16 June 2017.
Web archiving has become imperative to ensure that our digital heritage does not disappear forever, yet many institutions have not begun this work. In addition, archived websites are not easily discoverable, which severely limits their use. To address this challenge, OCLC Research has established the OCLC Research Library Partnership Web Archiving Metadata Working Group to develop a data dictionary that will be compatible with library and archives standards. Three reports on this project are available in July 2017, focused on metadata best practices guidelines, user needs and behaviors, and evaluation of web archiving tools.
More information: oc.lc/wam
Contact: Jackie Dooley, dooleyj@oclc.org
Web Archive Profiling Through Fulltext SearchSawood Alam
Â
Through analyzing full text search results from web archives, the authors developed a method called the Random Searcher Model (RSM) to efficiently generate profiles of web archive collections with low overhead. The profiles accurately predict an archive's likelihood of containing a URI's mementos while minimizing search costs. Different RSM modes allow customization based on collection characteristics. The authors recommend profile policies and RSM modes to balance accuracy, recall, and costs depending on available archive metadata. Future work includes combining profile attributes and evaluating profiles for applications beyond memento routing.
Looks at hyperlinks from the perspective of a managed collection of resources for which link persistence/integrity is considered a quality of service concern. Distinguishes between links into other managed collections and to the web at large. Considers link rot and content drift.
Introduction to Linked Data Platform (LDP)Hector Correa
Â
The Linked Data Platform (LDP) defines rules for HTTP operations on web resources to provide an architecture for read-write Linked Data on the web. Key concepts include resources, RDF sources, non-RDF sources, and containers. LDP uses HTTP requests and responses to create, retrieve, update, and delete resources. Resources can be contained within different types of containers, including basic, direct, and indirect containers. LDP provides a standard way to manage Linked Data using HTTP.
This document describes a web service that analyzes web crawl data to provide contextual information about locations. It extracts topics like weather, healthcare, crime, and employment that are relevant to a given location from common crawl data stored on Amazon S3. The system uses Apache Pig on a Hadoop cluster to analyze the data, builds an index of locations to associated words, and makes the results searchable through Elastic Search. It aims to provide useful information to people moving to new places, policy makers, journalists, and researchers.
Information sharing about Columbia University Libraryâs recent web archiving ...Anna Perricci
Â
This conference at Columbia University focused on web archiving tools and models. It featured presentations on projects funded by Mellon grants that developed new tools and collaborative models for web archiving. These included projects that expanded access to legal documents, created new platforms for storing and analyzing web archives, and developed tools for curating web archive collections. The conference provided an opportunity for participants to discuss challenges and opportunities for further collaboration in web archiving.
Slides for a workshop session on "Building an Accessible Digital Institution" facilitated by Brian Kelly, Innovation Advocate, Cetis at the Cetis conference held at the University of Bolton on 17-18 June 2014.
See http://www.slideshare.net/Thebriankelly/building-an-accessible-digital-institution
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
Â
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
This document provides an overview of content management systems (CMS) and their use for digital humanities projects. It discusses what a CMS is, popular open-source CMS platforms like WordPress and Omeka, and how to set up and customize WordPress and Omeka sites. The workshop aims to help participants understand the functionality of CMS platforms and how to choose one suitable for their project needs. The agenda includes hands-on exercises for configuring WordPress and Omeka sites.
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMagnus Pfeffer
Â
Tutorial held at the Semantic Web in Libraries conference in Hamburg, Germany, at November 25th 2013. The tutorial was held together with Kai Eckert, who did Part 1.
Abstract:
When metadata is distributed, combined, and enriched as Linked Data, the tracking of its provenance becomes a hard issue. Using data encumbered with licenses that require attribution of authorship may eventually become impracticable as more and more data sets are aggregated - one of the main motivations for the call to open data under permissive licenses like CC0. Nonetheless, there are important scenarios where keeping track of provenance information becomes a necessity. A typical example is the enrichment of existing data with automatically obtained data, for instance as a result of automatic indexing. Ideally, the origins, conditions, rules and other means of production of every statement are known and can be used to put it into the right context.
Part 1 - Metadata Provenance in RDF: In RDF, the mere representation of provenance - i.e., statements about statements - is challenging. We explore the possibilities, from the unloved reification and other proposed alternative Linked Data practices through to named graphs and recent developments regarding the upcoming next version of RDF.
Part 2 - Interoperable Metadata Provenance: As with metadata itself, common vocabularies and data models are needed to express basic provenance information in an interoperable fashion. We investigate the PROV model that is currently developed by the W3C Provenance Working Group and compare it to Dublin Core as a representative of a flat, descriptive metadata schema.
We actively encourage participants to present their own use cases and open challenges at this workshop. Please contact the organizers for details.
Prior experience: The workshop is intended for participants who have mastered the basics of linked data and want to delve into expressing provenance. Beside a basic understanding of RDF, the linked data principles and the use of ontologies (like Dublin Core or Bibo) to express bibliographic metadata no specialised knowledge is required.
Slides for a presentation made at the Archives Association of British Columbia's 2016 Annual Conference, April 15, 2016, held in Vancouver, BC, Canada.
The slides aim to provide users with a basic introduction to some of the key considerations when implementing a digital preservation plan, describing the workflow with a series of cooking-related references.
"Web Archive services framework for tighter integration between the past and ...Ahmed AlSum
Â
This document describes Ahmed AlSum's PhD defense from February 2014 at Old Dominion University. It discusses his proposal for a Web Archive Services Framework to provide tighter integration between past and present web content. The framework includes several proposed services - a Content Service to access archived web pages, a Metadata Service to retrieve metadata like page titles and thumbnails, a URI Service to handle URI lookups across archives using HTTP redirection, and an overarching Archive Service. The goal is to develop standardized APIs and services to make archived web content more programmatically accessible and help researchers analyze trends over time.
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
This document summarizes Claire Knowles' presentation on updates from the Open Repositories 2014 conference regarding DSpace. The conference had over 460 attendees from 38 countries discussing repository topics. DSpace version 4 was recently released with new features, and version 5 is planned for late 2014 focusing on ORCID support, metadata for all objects, and streaming audio/visuals. Jisc is working on a repository shared services project to integrate key repository services and support open access.
This document provides an overview of curation and the Omeka content management system. It discusses how curation involves collecting, organizing and displaying information. Omeka is introduced as a platform developed by the Center for History and New Media to publish digital collections and exhibitions. The document reviews Omeka's core features and functionality, provides examples of how it can be used for education, and gives a brief introduction to Dublin Core metadata standards for cataloging digital objects.
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
Â
The document discusses research that revisits the graph structure of the web using a new large crawl from Common Crawl. It finds that the web has become more dense and connected over time, with the largest strongly connected component growing significantly. While previous research found power laws for in- and out-degrees, this data does not fit power laws and instead has heavy-tailed distributions. The shape of the bow-tie structure also depends on the specific crawl used. The authors provide the new crawl data and analysis to enable further research on the evolving structure of the web graph.
Talk at the 3rd DBpedia Community Meeting in Dublin about the integration of the Web ProtÊgÊ ontology editor into DBpedia by the Corporate Semantic Web group at Freie Universität Berlin.
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Â
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
Cataloging Landscape Update: RDA and LC Working Group on the Future of Biblio...kramsey
Â
The document summarizes the report from the Library of Congress Working Group on the Future of Bibliographic Control. The report recommends that bibliographic control become more collaborative, decentralized, web-based, and international in scope. It suggests making efficiency improvements, enhancing access to special collections, positioning technology and the community for the future, and strengthening the library and information science profession. Key themes are economics, standards, cooperation, users, and research. The LC plans to analyze the recommendations and work with the library community to respond and implement changes over time.
Presented at the International Internet Preservation Consortium (IIPC) Web Archiving Week, University of London, 16 June 2017.
Web archiving has become imperative to ensure that our digital heritage does not disappear forever, yet many institutions have not begun this work. In addition, archived websites are not easily discoverable, which severely limits their use. To address this challenge, OCLC Research has established the OCLC Research Library Partnership Web Archiving Metadata Working Group to develop a data dictionary that will be compatible with library and archives standards. Three reports on this project are available in July 2017, focused on metadata best practices guidelines, user needs and behaviors, and evaluation of web archiving tools.
More information: oc.lc/wam
Contact: Jackie Dooley, dooleyj@oclc.org
Web Archive Profiling Through Fulltext SearchSawood Alam
Â
Through analyzing full text search results from web archives, the authors developed a method called the Random Searcher Model (RSM) to efficiently generate profiles of web archive collections with low overhead. The profiles accurately predict an archive's likelihood of containing a URI's mementos while minimizing search costs. Different RSM modes allow customization based on collection characteristics. The authors recommend profile policies and RSM modes to balance accuracy, recall, and costs depending on available archive metadata. Future work includes combining profile attributes and evaluating profiles for applications beyond memento routing.
Looks at hyperlinks from the perspective of a managed collection of resources for which link persistence/integrity is considered a quality of service concern. Distinguishes between links into other managed collections and to the web at large. Considers link rot and content drift.
Introduction to Linked Data Platform (LDP)Hector Correa
Â
The Linked Data Platform (LDP) defines rules for HTTP operations on web resources to provide an architecture for read-write Linked Data on the web. Key concepts include resources, RDF sources, non-RDF sources, and containers. LDP uses HTTP requests and responses to create, retrieve, update, and delete resources. Resources can be contained within different types of containers, including basic, direct, and indirect containers. LDP provides a standard way to manage Linked Data using HTTP.
This document describes a web service that analyzes web crawl data to provide contextual information about locations. It extracts topics like weather, healthcare, crime, and employment that are relevant to a given location from common crawl data stored on Amazon S3. The system uses Apache Pig on a Hadoop cluster to analyze the data, builds an index of locations to associated words, and makes the results searchable through Elastic Search. It aims to provide useful information to people moving to new places, policy makers, journalists, and researchers.
Information sharing about Columbia University Libraryâs recent web archiving ...Anna Perricci
Â
This conference at Columbia University focused on web archiving tools and models. It featured presentations on projects funded by Mellon grants that developed new tools and collaborative models for web archiving. These included projects that expanded access to legal documents, created new platforms for storing and analyzing web archives, and developed tools for curating web archive collections. The conference provided an opportunity for participants to discuss challenges and opportunities for further collaboration in web archiving.
Slides for a workshop session on "Building an Accessible Digital Institution" facilitated by Brian Kelly, Innovation Advocate, Cetis at the Cetis conference held at the University of Bolton on 17-18 June 2014.
See http://www.slideshare.net/Thebriankelly/building-an-accessible-digital-institution
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
Â
This document outlines Mat Kelly's PhD dissertation defense. The defense will address aggregating private and public web archives using the Mementity framework. Kelly will defend his dissertation to a committee chaired by Michele Weigle on May 7, 2019. The dissertation addresses challenges around capturing and replaying private content from the web, including content behind authentication or that requires special handling when aggregated. It proposes research questions around difficult to archive content types, comparing browser and crawler capabilities, issues with authenticated content, signaling content that needs special handling, and access controls for private archives.
Archiving Web-Based #musetech for Institutional MemorySamantha Norling
Â
Museum websites, blog and social media posts, gallery interactives, dashboards and micrositesâthese and other web-based content created by museum technologists contain a wealth of information about our institutions. Documenting everything from collections and exhibitions to public programs and staff activities, content created and shared on the web forms a vital part of a museum's institutional memory shared by its staff, audiences, and the communities of which it is a part.
While we'd like to think that web-based content and applications will live forever, the reality is that they often have a predetermined (or worse, unexpectedly shortened) active life on the web. Whether tied to a temporary exhibition or event, superseded by more current content, replaced by newer technologies, or fallen to technical obsolescence, retired web-based content can and should be archived for continued access to information in context.
This session will provide an overview of the web archiving landscape (best practices, available tools and resources, relevant initiatives). Web archiving activities of the Newfields Lab--in collaboration with Newfields Archives--will serve as case study. To date, the Newfields web archives include imamuseum.org, various blogs, the IMA Dashboard, and exhibition-related interactives and microsites--content which now serves a variety of uses as archives.
This is usually a memorandum of understanding between the repository management team and the institutions
research office which is used by library top management to assess the quality of the repository and whether the
repository is meeting the institutions business or academic objectives.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services â DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) â applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
Â
The document summarizes a webinar on getting started with web archiving. It discusses making the case for a web archiving program, selecting content, crawling and scoping websites, providing access to archived content, and building a sustainable program through policies, metadata, quality control, and addressing challenges. The webinar covered lessons learned and next steps such as additional outreach and exploring new technologies and uses for archived web content.
How you and your gateway can benefit from the services of the Science Gateway...Katherine Lawrence
Â
January 2017 webinar of the Science Gateways Community Institute. Recording and additional details available at http://sciencegateways.org/upcoming-events/webinars/#previous
Web archiving challenges and opportunitiesAhmed AlSum
Â
The document discusses challenges and opportunities in web archiving. It outlines the key stages in the web archiving lifecycle including selection of content, harvesting techniques, storage formats and infrastructure, ways to provide access, and the role of community. Specific challenges are discussed such as representing dynamic and social media content, optimizing storage solutions, and addressing limitations of current access interfaces. Opportunities exist in focusing collection efforts on underrepresented regions, leveraging existing archived data, and developing innovative services and tools to support researchers.
The document discusses the challenges of preserving web resources and web services. It notes that the ubiquity of the web, complicated interactions of resources and services, and dynamic nature of web 2.0 technologies like user-generated content pose new preservation challenges. It provides contact information for the JISC PoWR project at UKOLN and ULCC, which aims to address these issues through workshops, reports, and an online blog.
Unleashing library services with web 2.0 (ss)Dhanashree Date
Â
This presentation introduces some more often used Web 2.0 tools, Examples illustrate approrpriate use of these tools with benefits and downsides. A SWOT provides different perspectives of embracing Web 2.0 in libraries. Responsibities that follow Web 2.0 are highlighted.
METRO Conference 2014: How collaboration can save [more of] the web: recent p...Anna Perricci
Â
Note: these slides are very similar to another presentation with the same title presented at the Best Practices Exchange 2013 (some updates on the citation analysis project are in this presentation)
The goals of this presentation are to share case studies of evolving and thriving web archiving programs and inspire further discussion on how web archiving efforts can be strengthened through collaboration.
This document discusses building a software tool to archive websites using web crawling and blockchain technology. It proposes a system that crawls websites, stores web page content and metadata in WARC files, and records this information in a blockchain database with two layers - a domain blockchain to store domain information and a web content blockchain to store WARC files. This approach aims to provide a consistent and secure system for archiving websites while allowing users to monitor and analyze archived web content. The document reviews related work on web archiving and outlines the proposed system architecture and implementation requirements.
Estermann Wikidata and Heritage Data 20170914Beat Estermann
Â
This document discusses Wikidata and cultural heritage data. It aims to establish Wikidata as a central hub for cultural heritage data by ingesting related data and enhancing it. Key challenges include getting institutions to provide open data, assisting with data scraping, addressing coverage biases, mapping data models during ingestion, and dealing with incorrect data. Maintaining data quality over time through processes like updating and dispute resolution is also challenging. The document explores how Wikidata can better integrate with other databases and cultural heritage organizations to maximize data sharing and reuse.
Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France
Web 2.0 refers to second-generation Internet-based services that emphasize online collaboration and sharing among users. It is characterized by dynamic or user-generated content and social media growth. Organizations can benefit from Web 2.0 through reduced costs, enhanced customer loyalty, and effective low-cost marketing. Popular Web 2.0 tools include blogs, wikis, RSS feeds, social bookmarking, social networking, online photo galleries, and audio/video casting. These tools encourage participation, collaboration, and sharing of information online.
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
Â
This is the main slide deck for a workshop at iPRES 2018 on human scale web collecting. A primary focus of the presentation was the use of Webrecorder.io, a free, open source web archiving tool available to all.
This document discusses OpenDaylight documentation. It provides an overview of OpenDaylight, an open source SDN project. It describes the OpenDaylight documentation workflow using tools like AsciiDoc, Git and Gerrit. It also explains the process for joining the OpenDaylight documentation community and contributing documentation changes.
BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and ...lisbk
Â
Slides for a workshop session on "BS 8878: Systematic Approaches to Documenting Web Accessibility Policies and Practices" facilitated by Brian Kelly at the IWMW 2015 event held at Edge Hill University, Ormskirk on 27 July 2015.
See http://iwmw.org/iwmw2015/talks/systematic-approaches-to-documenting-web-accessibility-policies-and-practices/
Similar to JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives (20)
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
Â
This document proposes a framework for aggregating private and public web archives. It discusses the current state of memento aggregation and outlines ways to make timemaps and aggregation more expressive. This includes adding attributes to timemaps to provide more information about mementos without requiring full dereferencing, such as status codes, content digests, and indicators of private versus public captures. The framework aims to provide a more comprehensive view of the archived web by incorporating both personal and non-aggregated archives.
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesMat Kelly
Â
Mat Kelly presented a framework for aggregating personal, private, and institutional web archives while maintaining access control. The framework includes separate timemaps for different types of captures that could be aggregated while restricting access to private captures. Kelly sought input on use cases around access control for private web archives and mechanisms for protecting archived web pages. The presentation explored challenges in replaying private archives alongside public ones from institutions and how the framework could address these issues.
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
Â
The document describes a system for generating thumbnail summaries of large collections of web archive mementos. The system uses SimHash to identify sufficiently unique mementos based on similarities and differences in HTML markup. It calculates Hamming distance between memento SimHashes to select a subset for the summary that limits redundancy while preserving important captures. The visualizations generated by the system provide an overview of a website's evolution over time using 3-6 representative thumbnails.
The document provides an overview of browser-based digital preservation including:
- The current state of digital preservation which relies on web crawlers and archives like the Internet Archive. However, this approach is insufficient for preserving pages that are not popular, behind authentication, or use complex JavaScript.
- The requirements for new software to directly capture and preserve web pages from within the browser in order to address the limitations of current archival approaches.
- A proposed system called "WARCreate" that would leverage the Chrome extension API to capture web pages and resources and generate WARC files for preservation while maintaining the original browsing context.
Archive What I See Now - Archive-It Partner Meeting 2013 2013Mat Kelly
Â
This document summarizes a presentation about enabling individual web archiving. It discusses tools like WARCreate and WAIL that allow users to archive web pages from their browser in WARC format. Issues addressed include timely capture of breaking news, preserving original context like user profiles, and uploading personal archives to institutional archives. Goals of the Archive What I See Now project are to port WARCreate to Firefox, add capabilities to upload WARCs, and implement sequential archiving of linked resources.
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemMat Kelly
Â
This document describes a graph-based visualization system for navigating and predicting box office performance. The system represents movie data as interconnected nodes in a graph layout. Selecting different nodes allows navigation between the movie context and related contexts like actors. Node size and position encode attributes relevant to box office predictions. The system preprocesses and caches external data to make complex predictions accessible through an interactive visual interface.
The document introduces WARCreate and WAIL, tools that make web archiving easier. WARCreate allows users to archive web pages they see in their browser directly as WARC files, preserving context. WAIL packages existing tools like Heritrix and Wayback into a graphical user interface, allowing one-click archiving. Together these tools aim to make web archiving more accessible to personal archivists while still producing outputs compatible with institutional tools and standards.
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMat Kelly
Â
The document describes a set of tools that make enterprise-level web archiving accessible for personal use. The tools include a crawler (Heritrix), a web archive player (Wayback Machine), and an archive inspector (WARC-Proxy) that are installed locally on a personal machine. The interface provides one-click options to set up crawls, view archived pages in the local Wayback installation, and check archive status. It aims to support personal web archiving through a graphical user interface that allows customizing crawls, starting/stopping services, and works with existing WARC files from other tools on Windows, MacOS, and Linux systems.
An Extensible Framework for Creating Personal Web Archives of Content Behind ...Mat Kelly
Â
The document is a thesis that aims to develop an extensible framework for creating personal web archives of content behind authentication barriers. It discusses problems with current tools for personal web archiving, such as them breaking when sites change hierarchies and producing suboptimal archives. The thesis seeks to remedy such issues, preserve more social media content, and make archiving outputs more optimal. It utilizes tools like Archive Facebook and WARCreate to generate navigable archives in a format compatible with replay systems like Wayback Machine.
If twitter is the "first draft of history", then we should be doing a better job of preserving it. For the one
year anniversary of the Egyptian revolution (2012) we revisited a sample of the shared social media content and found nearly 11% missing from the current web, and only 20% available in public web archives. Spurred by this, we sampled tweets for ve other culturally important events from 2009-2012 and found similar rates for archiving and loss.
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageMat Kelly
Â
The Internet Archive's Wayback Machine is the most common way that typical users interact with web archives. The Internet Archive uses the Heritrix web crawler to transform pages on the publicly available web into Web ARChive (WARC) files, which can then be accessed using the Wayback Machine. Because Heritrix can only access the publicly available web, many personal pages (e.g., password-protected pages, social media pages) cannot be easily archived
into the standard WARC format. We have created a Google
Chrome extension,WARCreate, that allows a user to create
a WARC file from any webpage. Using this tool, content that might have been otherwise lost in time can be archived in a standard format by any user. This tool provides a way for casual users to easily create archives of personal online
content. This is one of the first steps in resolving issues of
long term storage, maintenance, and access of personal digital assets that have emotional, intellectual, and historical
value to individuals
NDIIPP/NDSA 2011 - YouTube Link RestorationMat Kelly
Â
Creating Persistent Links to YouTube Music Videos
The document discusses the problem of links to YouTube videos becoming invalid when videos are removed. It proposes introducing a resolver service that redirects links to alternative copies of videos when the original link returns a 404 error. This service would also retrieve and publish metadata about videos to external websites to help find available copies when the initial link is broken. The goal is to create persistent links to YouTube music videos even if the specific video is removed from YouTube.
Archive Facebook is an add-on for Mozilla Firefox that allows users to create stand-alone archives of the content on their Facebook account. It preserves the look and feel of Facebook, unlike Facebook's native downloading option. The add-on lets users choose what specific types of content to archive, rather than limiting it to what Facebook allows. This ensures the archive is a true snapshot of the user's Facebook data and history. The add-on provides an easy-to-use interface to navigate and access archived content.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
Â
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Â
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
Â
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Â
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Â
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
Â
An English đŹđ§ translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech đ¨đż version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Â
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
Â
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
Â
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power gridâs behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether youâre at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. Weâll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Â
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Â
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Â
Are you ready to revolutionize how you handle data? Join us for a webinar where weâll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, weâll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sourcesâfrom PDF floorplans to web pagesâusing FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether itâs populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
Weâll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives
1. A Framework for Aggregating
Private and Public Web Archives
Mat Kelly
Old Dominion University, Norfolk, VA
Advisor: Michele C. Weigle
JCDL 2015 Doctoral Consortium
June 21, 2015
2. The Problem
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
2
private
archive
private
archive
other
private
archive
other
private
archive
3. All Archives Cannot Be Aggregated
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
3
private
archive
private
archive
other
private
archive
TimeMap
other
private
archive
14. JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
14
1 year ago 2 year ago 10 year ago
âŚ
180 year ago
TimeMap
16. Proactive Preservation
⢠Just-in-time WARC creation
⢠Personal and potentially private web archiving
⢠Mitigates deferral problem
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
16
17. Public vs. Private
Web Archiving
⢠Public Web Archiving
â Relies on deferred capture
â Uses WARC, Memento, etc.
â Integrates with other public web archives
⢠Private Web Archiving
â Same tools, less overhead, less bureaucracy
â Uses WARC, Memento, etc.
â Does not integrate
A Framework for Aggregating Private and
Public Web Archives
17
JCDL 2015 Doctoral
Consortium
18. Typical Web Archive Access
1. Web User Interface
2. Memento
TimeGate
TimeMap
â Accept-Datetime (content negotiation)
A Framework for Aggregating Private and
Public Web Archives
18
URI-G
TimeMap
JCDL 2015 Doctoral
Consortium
19. Aggregating Multiple Web Archives
⢠Memento Aggregator
â Temporally Sorted TimeMap combined from
multiple archives
â Allows temporal gaps in one archive to be filled in
by another
TimeMap
20. Archive Supplementation
⢠More capturesď greater temporal coverage
⢠Content on Deep Web
⢠A large chunk of the Web is not preserved
â Toolsâ inability
â Inconsistency over time due to personalization
A Framework for Aggregating Private and
Public Web Archives
20
JCDL 2015 Doctoral
Consortium
21. Concerns in Aggregating Private
Web Archives
⢠Privacy
⢠Inconsistency of page representation
â URI is insufficient key for access
A Framework for Aggregating Private and
Public Web Archives
21
JCDL 2015 Doctoral
Consortium
⢠Archival integrity
â Has private archives content been manipulated?
22. Why Individuals Might Want
Personalized Aggregations
⢠Show my private web archive captures
⢠Concerned about exposing sensitive info to
public
â But still want to view temporally inline
⢠Private/Restricted Archives are becoming ever
more common
A Framework for Aggregating Private and
Public Web Archives
22
JCDL 2015 Doctoral
Consortium
24. My Archives Have
What They May Have Missed
A Framework for Aggregating Private and
Public Web Archives
24
JCDL 2015 Doctoral
Consortium
25. The Concerns Distilled
⢠Access Control
â And indicators for PWA
⢠Preservation of Private Content
⢠Interoperability without privacy compromise
A Framework for Aggregating Private and
Public Web Archives
25
JCDL 2015 Doctoral
Consortium
26. Web Archive Usage Pattern 1:
Direct Access
A Framework for Aggregating Private and
Public Web Archives
26
OR
TimeMap
JCDL 2015 Doctoral
Consortium
27. Web Archive Usage Pattern 2:
Web Archive Aggregation
⢠Better results for a URI due to more sources
for capture
A Framework for Aggregating Private and
Public Web Archives
27
TimeMap
JCDL 2015 Doctoral
Consortium
28. Previous Patterns: Status Quo
⢠Patterns 1 and 2 are status quo
â provided by framework
⢠Querying web archives currently only
considers public web content
â URI for lookup
⢠Framework introduces 2 new entities
â Memento Meta Aggregator (MMA)
â Private Web Archive Adapter (PWAA)
A Framework for Aggregating Private and
Public Web Archives
28
JCDL 2015 Doctoral
Consortium
29. Memento Meta Aggregator (MMA)
⢠Functional superset of (MA)
⢠Can act as intermediary client to relay MA
results to ultimate user
⢠Allows just-in-time (JIT) inclusion of archives
â as specified at query time
⢠Set of archives aggregated can be dynamic
â e.g., Results must not include IA
A Framework for Aggregating Private and
Public Web Archives
29
JCDL 2015 Doctoral
Consortium
30. MY CNN CAPTURES
Aggregating My Captures
A Framework for Aggregating Private and
Public Web Archives
30
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
Various public web archives
My web archives
31. MY CNN CAPTURES
The Current Memento Aggregator
A Framework for Aggregating Private and
Public Web Archives
31
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
100
30
10
32. MY CNN CAPTURES
Accessing the Aggregator
A Framework for Aggregating Private and
Public Web Archives
32
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
100
30
10
33. MY CNN CAPTURES
Accessing the Aggregator
âŚdoes not include our archives
A Framework for Aggregating Private and
Public Web Archives
33
MY BANK CAPTURES
NOT AGGREGATED
NOT AGGREGATED
JCDL 2015 Doctoral
Consortium
100
30
10
140
34. Access via the Meta Aggregator
MY CNN CAPTURES
Pattern 3: Aggregator Relay
MY BANK CAPTURES
100
30
10
140140
35. MY CNN CAPTURES
Web Archive Usage Pattern 4:
Including Additional Archives in Aggregation
MY BANK CAPTURES
Access via the Meta Aggregator
âŚallows our archives to be included
100
30
10
15
140155
36. MY CNN CAPTURES
MMAs Allow Our Public Captures
to be Shared
A Framework for Aggregating Private and
Public Web Archives
36
MY BANK CAPTURES
JCDL 2015 Doctoral
Consortium
100
30
10
15
140155
155
155
37. MY CNN CAPTURES
Web Archive Usage Pattern 5:
Recursive MMA Access
A Framework for Aggregating Private and
Public Web Archives
37
MY BANK CAPTURES
âŚ
Bobâs public
CAPTURES
The organizationâs
public CAPTURES 1
The organizationâs
public CAPTURES 2
contains
A B C D
Contains
B C D
Contains
C D
A
B C
D
JCDL 2015 Doctoral
Consortium
10
5
15
15
20
35
35
15
50
50
38. New Framework Entity 1:
Memento Meta Aggregator
⢠Allow dynamic and JIT set of archives
⢠Superset can be recursively constructed
⢠Sets can be shared
My public captures
can be integrated
with public web archivesâ
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
38
39. Private Web Archive Adapter
(PWAA)
⢠Regulates access to Private Web Archives
(PWAs)
⢠Acts as token authorizer
⢠With credentials OK, relays results as if
querying the PWA directly
A Framework for Aggregating Private and
Public Web Archives
39
JCDL 2015 Doctoral
Consortium
40. MY CNN CAPTURES
User Establishes Access with PWA
A Framework for Aggregating Private and
Public Web Archives
40
MY BANK CAPTURES
GET TOKEN for PWA
Key: abcd1234
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
41. MY CNN CAPTURES
MMA Relays Request
A Framework for Aggregating Private and
Public Web Archives
41
MY BANK CAPTURES
GET TOKEN for PWA
Key: abcd1234
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
42. MY CNN CAPTURES
PWAA Accepts Request
Generates Reusable Token
A Framework for Aggregating Private and
Public Web Archives
42
MY BANK CAPTURES
ACCESS OK
Token: 4f33c64
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
43. MY CNN CAPTURES
User Submits Request for URI-R
with Token
A Framework for Aggregating Private and
Public Web Archives
43
MY BANK CAPTURES
GET mementos for URI
Token: 4f33c64
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
44. MY CNN CAPTURES
MMA Relays Request (again)
A Framework for Aggregating Private and
Public Web Archives
44
MY BANK CAPTURES
GET mementos for URI
Token: 4f33c64
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
45. MY CNN CAPTURES
PWAA Verified & Relays Request
MA Gets Mementos, per usual
A Framework for Aggregating Private and
Public Web Archives
45
MY BANK CAPTURES
Token: 4f33c64
OK
GET mementos for URI
GET mementos for URI
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
46. MY CNN CAPTURES
Archives Return Mementos
A Framework for Aggregating Private and
Public Web Archives
46
MY BANK CAPTURES
Token: 4f33c64 OK
Returning mementos
Return mementos
For URI
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
47. MY CNN CAPTURES
PWAA Relays TimeMap
A Framework for Aggregating Private and
Public Web Archives
47
MY BANK CAPTURES
TimeMap
TimeMap
TimeMap
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
140
10,000
10,143 140 captures
48. MY CNN CAPTURES
MMA Annotates and Aggregates
A Framework for Aggregating Private and
Public Web Archives
48
MY BANK CAPTURES
TimeMap
TimeMap
TimeMap
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
10,143
140 captures
3 captures
10,000 captures
49. MY CNN CAPTURES
Web Archive Usage Pattern 6:
Aggregating Public & Private Archives
A Framework for Aggregating Private and
Public Web Archives
49
MY BANK CAPTURES
TimeMap
JCDL 2015 Doctoral
Consortium
100
30
10
3 captures
10,000 captures
10,143 captures
50. MY CNN CAPTURES
Regulated Access Can Be Shared
A Framework for Aggregating Private and
Public Web Archives
50
MY BANK CAPTURES
GET mementos for URI
Token: 4f33c64
GET mementos for URI
Token: c5463b4
GET TOKEN for PWA
Key: 2265eef3
No/invalid token
returned
Access denied or
0 mementos
JCDL 2015 Doctoral
Consortium
3 captures
10,000 captures
51. Aggregating Multiple PWAs
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
51
MY BANK CAPTURES
Lindaâs Private
Captures
Bobâs Private
Captures
GET TOKENs for PWAs
Key: abcd1234, Archive: My
Key: cab45cbf, Archive: Linda
Key: b0b01b, Archive: Bob
3 captures
5 captures
10 captures
5
3
10
52. Aggregating Multiple PWAs
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
52
MY BANK CAPTURES
Access OK
Token: 7790ca
Access OK
Token: b0b01b
ACCESS
DENIED
Lindaâs Private
Captures
Bobâs Private
Captures
3 captures
5 captures
10 captures
5
3
10
53. PWAs Can Then be Aggregated
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
53
MY BANK CAPTURES
GET mementos for URI
Token: 7790ca, Archive: My
Token: null, Archive: Linda
Token: b0b01b, Archive: Bob
Lindaâs Private
Captures
Bobâs Private
Captures
3 captures
5 captures
10 captures
5
3
10
3
10
ø13
54. Sample TimeMap
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
,
<http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memen
to"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
54
TimeMap
55. Access Token Included in TimeMap
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
,
<http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memen
to"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
55
MY PRIVATE FACEBOOK CAPTURES
56. My Public Web Archive,
Now Aggregated
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
,
<http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="meme
nto"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
56
MY PUBLIC FACEBOOK CAPTURES
57. Evaluation Plan
⢠How effective is the Framework?
⢠Scalability ramifications of additional
infrastructure?
⢠Is public-private tokenization most suitable
method for persistent access?
⢠How can a single archive be sub-divided
between private/public and access controlled?
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
57
58. Previous Work
Preservation and Replay
PDA 2013 - Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
JCDL 2012 - WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
Evaluating Capture
IJDL 2015 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources
IJDL 2015 - The Impact of JavaScript on Archivability
JCDL 2014 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing Resources
JCDL 2014 - The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and JavaScript
Dlib 2013 - A Method for Identifying Personalized Representations in the Archives
TPDL 2013 - On the Change in Archivability of Websites Over Time
Archival Integration
JCDL 2015 - Mobile Mink: Merging Mobile and Desktop Archived Webs
JCDL 2014 - Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento
58
WARCreate â preserve from the browser
WAIL â private web archiving all-in-one suite
Mink â Integrate the live and archived web
SOFTWARE PRODUCTS
PUBLICATIONS
59. Current Work
⢠Other approaches of archival lookup beyond
URI
⢠Appropriate metadata to indicate private web
content in WARC files
⢠Existing integration attempts by private web
archives & individuals
A Framework for Aggregating Private and
Public Web Archives
59
JCDL 2015 Doctoral
Consortium
60. ďź Background Research
ďź PhD Requirements (Coursework, Qualifying Exam, etc.)
ďź Build preliminary framework model
ď˛ JCDL Doctoral Consortium
EXTENDED RESEARCH
⢠Research prevalence of private web archives
⢠Research access control methods in web archiving and other domains
⢠Investigate other access patterns and expound on those defined
⢠PhD Candidacy Exam describing merit of research plan
⢠Implement feedback received from candidacy exam committee
⢠Programmatically implement MMA and PWAA
CASE STUDIES (real-world application)
⢠Publicly Available Non-Aggregated Archives (e.g., Rhizome)
⢠Deep web preservation/access (bank account/Facebook feeds)
⢠DISSERTATION DEFENSE
Dissertation Plan
61. Preliminary Publication Plan
JCDL 2016 Evaluation of User Access Patterns for Private Web Archives
TPDL 2016 Methods in adding JIT Inclusion of Private Web Archives in Memento
ACM
SACMAT*
Research exploring tokenization and similar methods for archival access
establishment
iPres 2016 Research investigating URI clash & other needed identifiers for
distinguishing archived content from the âdeep webâ with archived
content from the public live web.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
61
* Symposium on Access Control Models and Technologies
62. Future Research Questions
⢠Can a PWAA perform content negotiation[1] on
the private-public spectrum?
⢠What level of security is needed?
â e.g., reporting UNAUTHORIZED vs. 0 mementos
A Framework for Aggregating Private and
Public Web Archives
62
JCDL 2015 Doctoral
Consortium
[1] RFC2295 https://www.ietf.org/rfc/rfc2295.txt
63. Summation
⢠Why?
â No means exists to integrate private and public web
archives.
⢠How to Evaluate?
â Does this framework fit real world needs? Scalable?
⢠When will I know I am done?
â Any public/private web archive* can be integrated.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
63
* -compliant
64. References
⢠D. Abrams, R. Baecker, and M. Chignell. Information Archiving with Bookmarks: Personal Web Space Construction and
Archiving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 41â48, 1998.
⢠A. AlSum, M. Weigle, M. Nelson, and H. Van de Sompel. Profiling Web Archive Coverage for Top-Level Domain and Content
Language. International Journal on Digital Libraries, 14(3-4):149â166, 2014.
⢠J. F. Brunelle, M. Kelly, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. Not All Mementos Are Created Equal: Measuring The
Impact Of Missing Resources. In Proceedings of JCDL, pages 321â330, London, England, 2014.
⢠J. F. Brunelle, M. Kelly, M. C. Weigle, and M. L. Nelson. The Impact of JavaScript on Archivability. International Journal on
Digital Libraries, pages 1â23, 2015.
⢠J. F. Brunelle and M. L. Nelson. An Evaluation of Caching Policies for Memento TimeMaps. In Proceedings of JCDL, pages
267â276, 2013.
⢠D. Gomes, S. Freitas, and M. J. Silva. Design and Selection Criteria for a National Web Archive. In Research and Advanced
Technology for Digital Libraries, pages 196â207. Springer, 2006.
⢠D. Hardt. The OAuth 2.0 Authorization Framework. IETF RFC 6749, October 2012.
⢠M. Jones and D. Hardt. The OAuth 2.0 Authorization Framework: Bearer Token Usage. IETF RFC 6750, October 2012.
⢠M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. A Method for Identifying Personalized Representations in the
Archives. D-Lib Magazine, 19(11/12), Nov/Dec 2013.
⢠M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. On the Change in Archivability of Websites Over Time. In Proceedings
of the International Conference on Theory and Practice of Digital Libraries (TPDL), pages 35â47, Valletta, Malta, 2013.
⢠M. Kelly, M. L. Nelson, and M. C. Weigle. Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving Using
XAMPP. Poster and demo presented at Personal Digital Archiving, February 2013.
⢠M. Kelly, M. L. Nelson, and M. C. Weigle. The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and
JavaScript. In Proceedings of JCDL, pages 25â28, London, England, September 2014.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
64
65. References
⢠M. Kelly and M. C. Weigle. WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In Proceedings of
JCDL, pages 437â438, Washington, DC, June 2012.
⢠C. C. Marshall. Rethinking Personal Digital Archiving, Part 1. D-Lib Magazine, 14(3/4), Mar/Apr 2008.
⢠C. C. Marshall. Rethinking Personal Digital Archiving, Part 2. D-Lib Magazine, 14(3/4), Mar/Apr 2008.
⢠J. Niu. Functionalities of Web Archives. D-Lib Magazine, 18(3/4), Mar/Apr 2012.
⢠M. Phillips. PANDORA, Australiaâs Web Archive, and the Digital Archiving System that Supports It.
http://pandora.nla.gov.au/pandas.html, 2003.
⢠H. C.-H. Rao, Y.-F. Chen, and M.-F. Chen. A Proxy-based Personal Web Archiving Service. SIGOPS Oper. Syst. Rev., 35(1):61â72,
Jan. 2001.
⢠A. Rauber, M. Kaiser, and B. Wachter. Ethical Issues in Web Archive Creation and Usage-Towards a Research Agenda. In 8th
International Web Archiving Workshop (IWAW08), 2008.
⢠D. Rosenthal. Re-thinking Memento Aggregation. http://blog.dshr.org/2013/03/re-thinking-memento-aggregation.html,
2013.
⢠T. Schwarz, M. Baker, S. Bassi, B. Baumgart, W. Flagg, C. van Ingen, K. Joste, M. Manasse, and M. Shah. Disk Failure
Investigations at the Internet Archive. In Work-in-Progess session, NASA/IEEE Conference on Mass Storage Systems and
Technologies (MSST2006), 2006.
⢠S. Strodl, F. Motlik, K. Stadler, and A. Rauber. Personal & Soho Archiving. In Proceedings of JCDL, pages 115â123, 2008.
⢠M. Thelwall and L. Vaughan. A fair history of the Web? Examining country balance in the Internet Archive. Library &
Information Science Research, 26(2):162â176, 2004.
⢠B. Tofel. âWaybackâ for Accessing Web Archives. In 7th International Web Archiving Workshop (IWAW07), 2007.
⢠H. Van de Sompel, M. Nelson, and R. Sanderson. HTTP Framework for Time-Based Access to Resource States â Memento.
IETF RFC 7089, December 2013.
⢠T. Wang, M. Srivatsa, and L. Liu. Fine-Grained Access Control of Personal Data. In Proceedings of the 17th ACM Symposium
on Access Control Models and Technologies, pages 145â156, 2012.
JCDL 2015 Doctoral
Consortium
A Framework for Aggregating Private and
Public Web Archives
65
66. A Framework for Aggregating
Private and Public Web Archives
Mat Kelly
Old Dominion University, Norfolk, VA
Advisor: Michele C. Weigle
JCDL 2015 Doctoral Consortium
June 21, 2015