This document provides an overview of ResourceSync, which is a framework for synchronizing web resources between systems. Some key points:
- ResourceSync was created to address limitations of existing protocols like OAI-PMH by allowing synchronization of any web resource and enabling both one-time and ongoing synchronization.
- It supports various capabilities for synchronization like resource lists, change lists, and notifications. These can be used for initial synchronization or incremental updates.
- Real-world examples are described where ResourceSync has been implemented for projects involving aggregation of digital collections, like Europeana and CLARIAH. It facilitates synchronization between diverse data sources.
- Presentations were given on how ResourceSync could also be useful
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
This presentation: (1( Discusses why providing detailed attributions of individual contributions is essential to large scale sharing of historical research data; (2) Provides a short introduction to Open Linked Data; (3) Introduces the PastPlace Gazetteer API (Applications Programming Interface), explaining components of the RDF it generates using the example of Oxford, UK; (4) Notes that most open data projects use the Creative Commons -- Must Ackowledge license (CC-BY) while not actually acknowledging contributors within their RDF, then shows how we do it; (5) Introduces the separate PastPlace Datafeed API, which implements the W3C Datacube Vocabulary.
This presentation introduces ResourceSync, a specification aimed to enable web-based synchronization of resources. The specification is the result of a collaboration between NISO and the Open Archives Initiative funded by the Sloan Foundation and JISC. The proposed resource synchronization approach is based on several existing specifications (e.g. Sitemaps, PubSubHubbub, well-known URI) and is aligned with common architectural principles (e.g. REST, follow your nose).
A 15 minute video version of these slides is available at https://www.youtube.com/watch?v=ASQ4jMYytsA
Linked Data: from Library Entities to the Web of DataRichard Wallis
Presentation to the ALCTS session "International Developments in Library Linked Data: Think Globally" at the American Library Association Conference in Las Vegas - June 2014
This talk was provided by Ursula Pieper of the National Agricultural Library for the NISO Virtual Conference, Using Open Source in Your Institution, held on Feb 17, 2016
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
This presentation: (1( Discusses why providing detailed attributions of individual contributions is essential to large scale sharing of historical research data; (2) Provides a short introduction to Open Linked Data; (3) Introduces the PastPlace Gazetteer API (Applications Programming Interface), explaining components of the RDF it generates using the example of Oxford, UK; (4) Notes that most open data projects use the Creative Commons -- Must Ackowledge license (CC-BY) while not actually acknowledging contributors within their RDF, then shows how we do it; (5) Introduces the separate PastPlace Datafeed API, which implements the W3C Datacube Vocabulary.
This presentation introduces ResourceSync, a specification aimed to enable web-based synchronization of resources. The specification is the result of a collaboration between NISO and the Open Archives Initiative funded by the Sloan Foundation and JISC. The proposed resource synchronization approach is based on several existing specifications (e.g. Sitemaps, PubSubHubbub, well-known URI) and is aligned with common architectural principles (e.g. REST, follow your nose).
A 15 minute video version of these slides is available at https://www.youtube.com/watch?v=ASQ4jMYytsA
Linked Data: from Library Entities to the Web of DataRichard Wallis
Presentation to the ALCTS session "International Developments in Library Linked Data: Think Globally" at the American Library Association Conference in Las Vegas - June 2014
This talk was provided by Ursula Pieper of the National Agricultural Library for the NISO Virtual Conference, Using Open Source in Your Institution, held on Feb 17, 2016
Slides from my workshop at Open Repositories 2016 about DSpace's Linked Data support. The slides include a short introduction into the Semantic Web and Linked Data, the main ideas behind the Linked Data support of DSpace, information on how to configure this feature and some examples about how to query DSpace installations for Linked Data.
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
Slides of my keynote at the CLARIAH Toogdag 2018 on 9 March at the National Library of the Netherlands. The main topics were the development of the distributed digital heritage network and the alignment to and cooperation with the CLARIAH infrastructure and data. It also points at some of the current limitations of the semantic web technology.
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
Given and overview about the DBpedia project and the role of DBpedia in the Web of Data and outlines the next steps from the Dbpedia project as well as ideas for using DBpedia data within the BBC.
Better together: building services for public good on top of content from the...petrknoth
CORE hosts the world’s largest collection of open access full texts, offering seamless, unrestricted access to research for citizens, researchers, libraries, software developers, funders and others. CORE’s aggregated content comes from thousands of institutional and subject repositories as well as journals and covers all research disciplines. In January 2019, CORE has hit the mark of 10 million monthly active users (10.41 million users). In September 2019, core.ac.uk has made it to the top 5k websites globally by user engagement as measured by the independent Alexa Rank, making it clearly one of the world’s most widely used Open Access services.
In this talk, Petr and Nancy will explain the role of CORE in the open science ecosystem. They will introduce the solutions CORE offers for improving the delivery of research literature, including tools for discovering freely available copies of papers that might be behind publishers’ paywalls as well as a recommender system for open access literature. The use of CORE data to monitor compliance with open access policies has also recently received attention. The presenters will then reflect on the challenges in the sector and share their experience of building value-added services for the society on top of open content offered by libraries and their affiliated institutional repositories and open access journals.
Linked Data Notifications Distributed Update Notification and Propagation on ...Aksw Group
Distributed Update Notification and Propagation on the Web of Data with: Linked Data Notifications, PubSubHubbub, Semantic Pingback and Structured Feedback
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
Security and Data Ownership in the Cloud
Andrew K. Pace, Executive Director, Networked Library Services, OCLC; Councilor-at-large, American Library Association
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
Preseted at OR2017 - Brisbane
Panel Discussion: COAR Next Generation Repositories: Results and Recommendations
The presentation focus on the recommended technologies to implement in Repository platforms
The nearly ubiquitous deployment of repository systems in higher education and research institutions provides the foundation for a distributed, globally networked infrastructure for scholarly communication. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the Web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices.
To that end, in April 2016, COAR launched a working group to identify the technologies and architectures of the next generation of repositories. There are two threads to our work: (1) increase the exposure by repositories of uniform behaviors that can be used by machine agents to fuel novel scholarly applications that reach beyond the scope of a single repository and that enable to smoothly embed repository content into mainstream web applications. (2) integrate with existing scholarly infrastructures, specifically those aimed at identification, as a means to solidly embed repositories in the overall scholarly communication landscape.
This panel will present the results of the COAR Next Generation Repositories Working Group including our vision, design assumptions, use cases, architectural and technical recommendations, and next steps. The session will also include time for audience discussion and feedback.
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
These slides accompanied a June 4th, 2016 presentation made by Dan Gillean of Artefactual Systems at the Association of Canadian Archivists' 2016 Conference in Montreal, QC, Canada.
This presentation aims to examine several existing or emerging computing paradigms, with specific examples, to imagine how they might inform next-generation archival systems to support digital preservation, description, and access. Topics covered include:
- Distributed Version Control and git
- P2P architectures and the BitTorrent protocol
- Linked Open Data and RDF
- Blockchain technology
The session is part of an attempt by the ACA to create interactive "working sessions" at its conferences. Accompanying notes can be found at: http://bit.ly/tech-Proche
Participants were also asked to use the Twitter hashtag of #techProche for online interaction during the session.
Slides from my workshop at Open Repositories 2016 about DSpace's Linked Data support. The slides include a short introduction into the Semantic Web and Linked Data, the main ideas behind the Linked Data support of DSpace, information on how to configure this feature and some examples about how to query DSpace installations for Linked Data.
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
Slides of my keynote at the CLARIAH Toogdag 2018 on 9 March at the National Library of the Netherlands. The main topics were the development of the distributed digital heritage network and the alignment to and cooperation with the CLARIAH infrastructure and data. It also points at some of the current limitations of the semantic web technology.
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
Given and overview about the DBpedia project and the role of DBpedia in the Web of Data and outlines the next steps from the Dbpedia project as well as ideas for using DBpedia data within the BBC.
Better together: building services for public good on top of content from the...petrknoth
CORE hosts the world’s largest collection of open access full texts, offering seamless, unrestricted access to research for citizens, researchers, libraries, software developers, funders and others. CORE’s aggregated content comes from thousands of institutional and subject repositories as well as journals and covers all research disciplines. In January 2019, CORE has hit the mark of 10 million monthly active users (10.41 million users). In September 2019, core.ac.uk has made it to the top 5k websites globally by user engagement as measured by the independent Alexa Rank, making it clearly one of the world’s most widely used Open Access services.
In this talk, Petr and Nancy will explain the role of CORE in the open science ecosystem. They will introduce the solutions CORE offers for improving the delivery of research literature, including tools for discovering freely available copies of papers that might be behind publishers’ paywalls as well as a recommender system for open access literature. The use of CORE data to monitor compliance with open access policies has also recently received attention. The presenters will then reflect on the challenges in the sector and share their experience of building value-added services for the society on top of open content offered by libraries and their affiliated institutional repositories and open access journals.
Linked Data Notifications Distributed Update Notification and Propagation on ...Aksw Group
Distributed Update Notification and Propagation on the Web of Data with: Linked Data Notifications, PubSubHubbub, Semantic Pingback and Structured Feedback
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
Security and Data Ownership in the Cloud
Andrew K. Pace, Executive Director, Networked Library Services, OCLC; Councilor-at-large, American Library Association
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
Preseted at OR2017 - Brisbane
Panel Discussion: COAR Next Generation Repositories: Results and Recommendations
The presentation focus on the recommended technologies to implement in Repository platforms
The nearly ubiquitous deployment of repository systems in higher education and research institutions provides the foundation for a distributed, globally networked infrastructure for scholarly communication. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the Web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices.
To that end, in April 2016, COAR launched a working group to identify the technologies and architectures of the next generation of repositories. There are two threads to our work: (1) increase the exposure by repositories of uniform behaviors that can be used by machine agents to fuel novel scholarly applications that reach beyond the scope of a single repository and that enable to smoothly embed repository content into mainstream web applications. (2) integrate with existing scholarly infrastructures, specifically those aimed at identification, as a means to solidly embed repositories in the overall scholarly communication landscape.
This panel will present the results of the COAR Next Generation Repositories Working Group including our vision, design assumptions, use cases, architectural and technical recommendations, and next steps. The session will also include time for audience discussion and feedback.
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
These slides accompanied a June 4th, 2016 presentation made by Dan Gillean of Artefactual Systems at the Association of Canadian Archivists' 2016 Conference in Montreal, QC, Canada.
This presentation aims to examine several existing or emerging computing paradigms, with specific examples, to imagine how they might inform next-generation archival systems to support digital preservation, description, and access. Topics covered include:
- Distributed Version Control and git
- P2P architectures and the BitTorrent protocol
- Linked Open Data and RDF
- Blockchain technology
The session is part of an attempt by the ACA to create interactive "working sessions" at its conferences. Accompanying notes can be found at: http://bit.ly/tech-Proche
Participants were also asked to use the Twitter hashtag of #techProche for online interaction during the session.
Talk given at Open Knowledge Foundation 'Opening Up Metadata: Challenges, Standards and Tools' Workshop, Queen Mary University of London, 13th June 2012.
Info on the event at http://openglam.org/2012/05/31/last-places-left-for-opening-up-metadata-challenges-standards-and-tools/
As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.
Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering
Next Steps for IMLS's National Digital PlatformTrevor Owens
This keynote, at the Upper Midwest Digital Collections Conference, provides and update on the National Digital Platform and 20 projects supported to enhance it. The national digital platform is a way of thinking about and approaching the digital capability and capacity of libraries across the US. In this sense, it is the combination of software applications, social and technical infrastructure, and staff expertise that provide library content and services to all users in the US. As libraries increasingly use digital infrastructure to provide access to digital content and resources, there are more and more opportunities for collaboration around the tools and services that they use to meet their users’ needs. It is possible for each library in the country to leverage and benefit from the work of other libraries in shared digital services, systems, and infrastructure.
We need to bridge gaps between disparate pieces of the existing digital infrastructure, for increased efficiencies, cost savings, access, and services. To this end, IMLS is focusing on the national digital platform as an area of priority in the National Leadership Grants to Libraries program and the Laura Bush 21st Century Librarian program. We are eager to explore how this way of thinking and approaching infrastructure development can help states make the best use of the funds they receive through the Grants to States program. We’re also eager to work with other foundations and funders to maximize the impact of our federal investment
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
A presentation given by Julie Allinson at the UK Archivematica group meeting on 6th November 2015 in Leeds. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
Slides accompanying a brief talk given as part of the Archivematica User Group meeting at #SAA2016, the Society of American Archivists 2016 conference in Atlanta, GA. The user group meeting was held on August 3rd Room 309/310 in the Hilton Atlanta.
These slides offer Archivematica users a brief update on the features included in the current 1.5 release and what's on the roadmap for future releases, as well as discussion of related events and resources such as the first ArchivematiCamp in August, screencasts, and more.
A summary of DBpedia's History and a detailed analysis of challenges and solutions.
We show how the Linked Data Cloud evolved around DBpedia and also what problems we and other data projects encountered. We included a section on the new solutions that will lead DBpedia into a bright future.
This slide deck has been prepared for a workshop on Linked Data Publishing and Semantic Processing using the Redlink platform (http://redlink.co). The workshop delivered at the Department of Information Engineering, Computer Science and Mathematics at Università degli Studi dell'Aquila aimed at providing a general understanding of Semantic Web Technologies and how these can be used in real world use cases such as Salzburgerland Tourismus.
A brief introduction has been also included on MICO (Media in Context) a European Union part-funded research project to provide cross-media analysis solutions for online multimedia producers.
Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France
Similar to ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web (20)
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
Memento Tracer
An Innovative Approach Towards Balancing
Scale and Fidelity for Web Archiving
Presentation at RESAW The Web That Was
Amsterdam, NL, June 20 2019
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
This 7-second Brain Wave Ritual Attracts Money To You.!
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web
1. An overview of capabilities and real-world use cases for discovery,
harvesting, and synchronization of resources on the web
http://www.openarchives.org/rs #resourcesync
ResourceSync
ANSI/NISO Z39.99-2017
Martin
Klein
Gretchen
Gueguen
Mark
Matienzo
Petr
Knoth
2. ResourceSync was funded by the Sloan Foundation & JISC
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
http://www.openarchives.org/rs #resourcesync
ResourceSync
ANSI/NISO Z39.99-2017
3. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
Background - OAI-PMH
• Recurrent metadata exchange
from a Data Provider to Service
Providers
• XML metadata only
• Repository centric
• Devised 1999-2002, prior to
REST, prior to dominance of
web search engines
4. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
Revisit the Problem Domain - ResourceSync
• Synchronization of resources
from a Source to Destinations
• Web resources, anything with
an HTTP URI & representation
• Resource centric
• Devised 2012-2013, leverages
key ingredients of web
interoperability, existing
specifications
• Updated in 2017 to v1.1
14. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
ResourceSync Change Notifications
• Notifications about change events to resources
• Source notifies subscribed Destinations (cf. recurrent pull)
• Push-based approach via WebSub
• Similar, sitemap-based payload
• Decrease synchronization latency between Source and Destination
• Change Notification Specification v1.0
15. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
EHRI Use Case
• Aggregation of information about Holocaust collections
• held by 1,800+ organizations worldwide
• into a central service
• EAD as exchange format
• Diversity of data sources and locations
• databases, spreadsheets (“home collections”)
https://ehri-project.eu/
http://portal.ehri-project.eu
https://twitter.com/EHRIproject
16. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
EHRI Use Case
• Special ResourceSync implementation
• Bridges gap between local systems and ResourceSync
capability documents on a web server
• Filters local resources by subject, time period, etc
• Set up by EHRI technical staff, run by contributing party
• Baseline synchronization: Resource Lists
• Incremental synchronization: Change Lists
• Together with EAD files moved from local system to web server
• Dropbox, FTP, USB stick
• Service: partners expose EADs, server collects and offers value-
added services e.g., graph database
https://ehri-project.eu/
http://portal.ehri-project.eu
https://twitter.com/EHRIproject
17. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
CLARIAH Use Case
• Various institutions host evolving collections
• Make collection items uniformly available via RDF graph
• Central registry holds description of all collections
• Researchers use Virtual Research Environment to
• Discover collections (via registry)
• Collect graphs from respective institution
• Keep graphs up to date
https://www.clariah.nl/
https://twitter.com/CLARIAH_NL
18. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
CLARIAH Use Case
• Baseline synchronization
• Download graph from DB
• Serialized as one or more files, one RDF triple per line
(+ s p o graph_name)
• + stands for “add”
• URIs of files listed in Resource List
• Incremental synchronization
• Changes logged in one or more files, one change per line
(+/- s p o graph_name)
• + stands for “add”, “-” for delete
• URIs of files listed in Change List
https://www.clariah.nl/
https://twitter.com/CLARIAH_NL
20. Hyku & DPLA
ResourceSync Implementations
Gretchen Gueguen, Data Services Coordinator
Digital Public Library of America, gretchen@dp.la
21. Project Background
● IMLS National Leadership Grant
(30 months)
● Foster a national digital
platform through
community-based repository
infrastructure
● Leverage & contribute to
Hydra, both in code and
community
22. Primary Project Goals
1. Develop turnkey (“easy to install, easy to maintain”)
Hydra-based application that leverages and improves on
core code components
2. Develop metadata aggregation & enrichment tools
3. Work toward a hosted service in the cloud
24. Metadata Aggregation @DPLA
Methods for Data Aggregation:
● OAI PMH (21 providers)
● Custom APIs/other (8 providers)
● Direct file transfer (3 providers)
Biggest Drawbacks:
● Re-synchronizing entire data sets
● Relying on http requests
25. ResourceSync and Hyku
● ResourceSync publishing support built into MVP
● Test application with 50,000 records to start
○ Limit for a single list. To add more, we would need to make a list of
lists.
● Resource lists and change lists are supported
● Resource or change dumps not currently supported
● Content negotiation for JSON-LD, N-Triples, and Turtle
26. ResourceSync and DPLA
Harvester developed for Hyku endpoint
● Development for this specific endpoint means that it’s
not a full test of all ResourceSync capabilities
● We suspect that we will prefer the Dump to the List
○ Using the List means making HTTP calls for each item in order to do
the content negotiation
○ Dump allows us to just download specifically what we need
○ We will still be downloading records that weren’t updated but given
the typical size of the diff for each provider this single download
may still be preferable to 100,000 HTTP requests
● Future implementations may require us to build on this
initial harvester if the specifics are different
27. Next Steps
Hyku:
● Possibly support Dump
● Increase test set over
50K
DPLA:
● Harvest from 3 DPLA
providers implementing
ResourceSync by end of
year
28. IIIF & ResourceSync:
Supporting discovery
Mark A. Matienzo, Stanford University Libraries
@anarchivist / https://orcid.org/0000-0003-3270-1306
DPLAFest — Chicago, Illinois — April 20, 2017
29. International Image Interoperability Framework
A community
that develops Shared APIs
implements them in Software
and exposes interoperable Content
http://iiif.io/
30. IIIF Community
http://iiif.io/community
● IIIF Consortium
○ Currently 38 state/national
libraries, universities, museums,
tech firms
○ Provides sustainability and steering
for the initiative
● Wider community
○ 80+ CH institutions, companies,
and projects using IIIF standards
○ iiif-discuss list = 670+ members
○ IIIF Slack = 300+ members
● Community & Technical
Specification Groups
31. Shared APIs
http://iiif.io/api/
● Image API
○ Transfer image pixels, regions, etc.
○ Image manipulation
● Presentation API
○ Presentation of an object (pixels +
navigation and metadata)
○ Easily share and re-use, mix and
match content
○ Annotate content
● Search API
○ Search annotations
● Authentication API
○ Provide interoperability for
access-restricted content
33. IIIF Content
All kinds of image resources:
artworks, photographs,
manuscripts, newspapers
Investigating AV and 3D
34. “Discovery”
in IIIF
Finding interoperable resources
Two main concerns:
● How can users find IIIF
resources?
● How can users then get those
resources into an environment
where they can use them?
35. Scoping the
problem
What resources
can be discovered?
Types of resources in IIIF:
● Content (Image API)
● Description (Presentation API)
The Image API does not provide
description of image content, just
technical and rights metadata.
Discovery requires Description
resources to provide information
about Content resources.
36. Presentation API
A Manifest provides
just enough metadata
(descriptive, structural,
etc.) to drive a viewer.
A Collection groups
Manifests or other
Collections.
http://iiif.io/api/presentation/2.1/
38. Presentation
API constraints
Informing decisions
The Presentation API does not
include semantic descriptions, but
can reference them using seeAlso.
IIIF (including the Presentation
API) has a resource-centric view of
the web, not a service-centric view
(cf Sitemaps/ResourceSync vs
OAI-PMH).
40. Basic Sitemaps
at NC State
● Example demonstrates use of
Simple sitemaps without any
extensions, including
ResourceSync
● Intended to expand upon
existing practice of publishing
sitemaps from digital collections
41. Sitemap entry for manifests
<url>
<loc>https://d.lib.ncsu.edu/collections/catalog/bh1141pnc004/manifest</loc>
<lastmod>2016-12-13T15:38:19Z</lastmod>
</url>
Sitemap entry for landing page
<url>
<loc>https://d.lib.ncsu.edu/collections/catalog/bh1141pnc004</loc>
<lastmod>2017-03-27T19:33:52Z</lastmod>
</url>
Sample of NCSU Sitemaps
Courtesy Jason Ronallo, North Carolina State University
42. Prototyping at
Europeana
Exploring Sitemaps and
extensions for discovery of
IIIF resources for harvesting
● Partnership with University
College Dublin and National
Library of Wales
● ResourceSync satisfied key
needs identified within
requirements
● ResourceSync accommodated
additional metadata prototyped
in an IIIF Sitemap Extension
● Follows several synchronization
paradigms
43. Uses Sitemaps and IIIF Extension
<url>
<loc>http://newspapers.library.wales/view/3320640</loc>
<iiif:Manifest xmlns:iiif="http://iiif.io/api/presentation/2/">
http://dams.llgc.org.uk/iiif/newspaper/issue/3320640/manifest.json
</iiif:Manifest>
<dct:isPartOf>http://dams.llgc.org.uk/iiif/newspapers/3320639.json</dct:isPartOf>
<lastmod>2014-11-08</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
Example of NLW Sitemap Entry
Courtesy Nuno Freire, Europeana
44. Uses Sitemaps and ResourceSync and DCMES as Extensions
<url>
<loc>https://digital.ucd.ie/view/ucdlib:38491</loc>
<rs:ln rel="alternate" href="https://digital.ucd.ie/view/ucdlib:38491"
type="application/json" dcterms:conformsTo="http://iiif.io/api/presentation/2.1/"/>
<rs:ln rel="collection” href="https://digital.ucd.ie/view/ucdlib:38488”
type="application/json" dcterms:conformsTo="http://iiif.io/api/presentation/2.1/"/>
<lastmod>2014-08-24T04:09:09.716Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
Example of UCD Resource List Entry
Courtesy Nuno Freire, Europeana
45. Uses Sitemaps, ResourceSync, and Sitemap Image Extension
Sample of UCD Resource List
Courtesy John Howard, University College Dublin
46. Conclusions
Strengths
● ResourceSync addresses core requirements
for exposing IIIF resources for harvesting
● Can build on publication of existing
sitemaps easily
● Leverages Many-to-One, Selective
Synchronization, and Metadata Harvesting
paradigms
● Can adopt additional extensions to
implement needed features
● Plenty of opportunity to contribute; need
more prototypes
Challenges
● IIIF community’s needs for discovery are
not necessarily what other sitemap
consumers want (e.g. Google)
● Identifying the primary resource influences
structure
● Unclear whether search engines support
custom extensions, and what ranking
impact would be
47. Thank You!
Mark A. Matienzo, Stanford University Libraries
@anarchivist / https://orcid.org/0000-0003-3270-1306
DPLAFest — Chicago, Illinois — April 20, 2017
50. Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
51. Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
» Enrichment and
harmonisation of
aggregated data
» Products/services:
› Portal
› API
› Data dumps
› Recommendation
system for libraries
› Repository dashboard
› B2B and analytical
services
52. Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
» 70 million+
metadata records
» Over 6 million full
texts hosted on
CORE
» ~1.5 million
monthly active
users
» Aggregating from
2,500 repositories
and 10k OA
journals
54. Use Case 1: Approach
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
Provide seamless access over non-standardised APIs.
What protocol?
55. Use Case 1: Approach
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
Provide seamless access over non-standardised APIs.
What protocol? » Why not OAI-PMH?
› slow and very inefficient
for big repositories.
› Standardised for
metadata transfer but
not for content transfer.
› Very difficult to
represent the richness of
metadata from a broad
range of data providers.
58. Use Case 2: Subscribing to ResourceSync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
ResourceSync
+ many others
» Other aggregators can
subscribe to the Publisher
connector to make use of their
ingestion pipelines and
enrichment technologies
59. Use Case 2: Content ingestion in OpenMinTeD
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
OMTD-SHARE
(over REST)
A range of bespoke APIs
+ many others
» CORE and OpenAIRE are content sources in the OpenMinTeD
TDM platform (EU infrastructure project) being developed to
enable the mining of scholarly literature.
60. Use Case 2: Exposing enriched data for TDM
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
A range of bespoke APIs
+ many others
ResourceSync
» But others want similar solutions … typically, they want to be
able to sync and host the data.
62. Use Case 3: Replace OAI-PMH with ResourceSync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
OMTD-SHARE
(over REST)
A range of bespoke APIs
+ many others
ResourceSync
ResourceSync
» Will be a game changer …
» Advocated by COAR Next
Generation Repositories WG
67. An overview of capabilities and real-world use cases for discovery,
harvesting, and synchronization of resources on the web
http://www.openarchives.org/rs #resourcesync
ResourceSync
ANSI/NISO Z39.99-2017
@mart1nkle1n @G_AmSpinnrade @anarchivist @petrknoth