Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Ahmed AlSum, Michele C. Weigle, Michael L. Nelson,
Herbert Van de Sompel
TPDL 2013
September 22-26, 2013
Valletta, Malta
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Overview of how data on the Web of Data can be consumed (first and foremost Linked Data) and implications for the development of usage mining approaches.
References:
Elbedweihy, K., Mazumdar, S., Cano, A. E., Wrigley, S. N., & Ciravegna, F. (2011). Identifying Information Needs by Modelling Collective Query Patterns. COLD, 782.
Elbedweihy, K., Wrigley, S. N., & Ciravegna, F. (2012). Improving Semantic Search Using Query Log Analysis. Interacting with Linked Data (ILD 2012), 61.
Raghuveer, A. (2012). Characterizing machine agent behavior through SPARQL query mining. In Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France.
Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. arXiv preprint arXiv:1103.5043.
Hartig, O., Bizer, C., & Freytag, J. C. (2009). Executing SPARQL queries over the web of linked data (pp. 293-309). Springer Berlin Heidelberg.
Verborgh, R., Hartig, O., De Meester, B., Haesendonck, G., De Vocht, L., Vander Sande, M., ... & Van de Walle, R. (2014). Querying datasets on the web with high availability. In The Semantic Web–ISWC 2014 (pp. 180-196). Springer International Publishing.
Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., & Van de Walle, R. (2014, April). Web-Scale Querying through Linked Data Fragments. In LDOW.
Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. In Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn2011), CEUR WS.
Luczak-Rösch, M. (2014). Usage-dependent maintenance of structured Web data sets (Doctoral dissertation, Freie Universität Berlin, Germany), http://edocs.fu-berlin.de/diss/receive/FUDISS_thesis_000000096138.
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Ahmed AlSum, Michele C. Weigle, Michael L. Nelson,
Herbert Van de Sompel
TPDL 2013
September 22-26, 2013
Valletta, Malta
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Overview of how data on the Web of Data can be consumed (first and foremost Linked Data) and implications for the development of usage mining approaches.
References:
Elbedweihy, K., Mazumdar, S., Cano, A. E., Wrigley, S. N., & Ciravegna, F. (2011). Identifying Information Needs by Modelling Collective Query Patterns. COLD, 782.
Elbedweihy, K., Wrigley, S. N., & Ciravegna, F. (2012). Improving Semantic Search Using Query Log Analysis. Interacting with Linked Data (ILD 2012), 61.
Raghuveer, A. (2012). Characterizing machine agent behavior through SPARQL query mining. In Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France.
Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. arXiv preprint arXiv:1103.5043.
Hartig, O., Bizer, C., & Freytag, J. C. (2009). Executing SPARQL queries over the web of linked data (pp. 293-309). Springer Berlin Heidelberg.
Verborgh, R., Hartig, O., De Meester, B., Haesendonck, G., De Vocht, L., Vander Sande, M., ... & Van de Walle, R. (2014). Querying datasets on the web with high availability. In The Semantic Web–ISWC 2014 (pp. 180-196). Springer International Publishing.
Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., & Van de Walle, R. (2014, April). Web-Scale Querying through Linked Data Fragments. In LDOW.
Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. In Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn2011), CEUR WS.
Luczak-Rösch, M. (2014). Usage-dependent maintenance of structured Web data sets (Doctoral dissertation, Freie Universität Berlin, Germany), http://edocs.fu-berlin.de/diss/receive/FUDISS_thesis_000000096138.
ResourceSync Tutorial from Open Repositories 2013Simeon Warner
Slides form the tutorial on the ResourceSync Framework presented at Open Repositories 2013 in Charlottetown, PEI on 8 July 2013. The latest set of ResourceSync tutorial slides are available at http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial
ResourceSync core team members Bernhard Haslhofer and Simeon Warner will present on the ResourceSync specification and provide practical examples and scenarios for its application.
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
Slides from my workshop at Open Repositories 2016 about DSpace's Linked Data support. The slides include a short introduction into the Semantic Web and Linked Data, the main ideas behind the Linked Data support of DSpace, information on how to configure this feature and some examples about how to query DSpace installations for Linked Data.
What is Linked Data?
Presented at the Linked Data for Libraries on Thursday, November 6, 2014 at Trinity College Dublin
http://www.dri.ie/linked-data-libraries
Plays Well with Others: Getting Your Digital Collection Metadata Ready for th...William Fee
Presented at 2015 PaLA Annual Conference on November 6, 2015 by
Linda Ballinger, Penn State
Doreva Belfiore, Temple University
Bill Fee, State Library of Pennsylvania
Leanne Finnigan, Temple University
Kristen Yarmey, University of Scranton
This presentation was provided by Emma Ganley of the Public Library of Science during the August 10 NISO-NASIG webinar, How Librarians Use, Support and Can Implement Researcher Identifiers.
Questioning Authority Lookup Service: Linking the DataSimeon Warner
One segment of a presentation "From idea to implementation: BIBFRAME becomes reality", Charleston, 2022
The implementation of BIBFRAME in active cataloguing workflows and linked data exchange environments is live and it’s evolving across several paths that are often intertwined. This complex bibliographic ecosystem consists of many experiences that the speakers will present highlighting their value both as autonomous endeavours, as well as from the perspective of interaction and options for mutual integration.
The Library of Congress, with the BIBFRAME original cataloguing editor, Marva, will report about developments and achievements for bringing BIBFRAME into practice in a very large library environment with many cataloguing workflows for diverse types of resources, encompassing the use of and adjustments to the BIBFRAME ontology and its modelling.
On the topic of original and copy cataloguing in linked data, Stanford and Cornell Universities are working to achieve a dynamic form of cataloguing through the implementation of Sinopia linked data editor and enrichment tools such as the Questioning Authority that queries authoritative sources to support linked data authorities.
Regarding the impact of linked data processes on the user experience, the University of Pennsylvania has contributed a study describing the functionalities and scenarios which the Share-VDE 2.0 entity discovery system https://www.svde.org/ addresses, and the ways in which user feedback is supporting the evolution of linked data discovery.
Share-VDE (SVDE) is an international library-driven initiative which brings together the bibliographic catalogues and authority files of a community of libraries in an innovative entity discovery environment based on linked data. A path towards the integration of SVDE with the local library services at the University of Pennsylvania and with the Sinopia environment is ongoing. Being a linked open data node, SVDE supports various levels of interoperability and also provides additional tools like the J.Cricket entity editor based on BIBFRAME that opens up new forms of cooperation among libraries to manage and maintain linked data entities.
OCFL: A Shared Approach to Preservation PersistenceSimeon Warner
A lightning talk at the CNI Fall Forum 2022: The Oxford Common File Layout (OCFL) is an application-independent method for storing and versioning content for digital preservation. Version 1.1 was released in October 2022, including backwards compatible corrections and clarifications based on implementation experience and community feedback. The session will recap goals, summarize changes in v1.1, and survey current implementations.
ResourceSync Tutorial from Open Repositories 2013Simeon Warner
Slides form the tutorial on the ResourceSync Framework presented at Open Repositories 2013 in Charlottetown, PEI on 8 July 2013. The latest set of ResourceSync tutorial slides are available at http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial
ResourceSync core team members Bernhard Haslhofer and Simeon Warner will present on the ResourceSync specification and provide practical examples and scenarios for its application.
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
Slides from my workshop at Open Repositories 2016 about DSpace's Linked Data support. The slides include a short introduction into the Semantic Web and Linked Data, the main ideas behind the Linked Data support of DSpace, information on how to configure this feature and some examples about how to query DSpace installations for Linked Data.
What is Linked Data?
Presented at the Linked Data for Libraries on Thursday, November 6, 2014 at Trinity College Dublin
http://www.dri.ie/linked-data-libraries
Plays Well with Others: Getting Your Digital Collection Metadata Ready for th...William Fee
Presented at 2015 PaLA Annual Conference on November 6, 2015 by
Linda Ballinger, Penn State
Doreva Belfiore, Temple University
Bill Fee, State Library of Pennsylvania
Leanne Finnigan, Temple University
Kristen Yarmey, University of Scranton
This presentation was provided by Emma Ganley of the Public Library of Science during the August 10 NISO-NASIG webinar, How Librarians Use, Support and Can Implement Researcher Identifiers.
Questioning Authority Lookup Service: Linking the DataSimeon Warner
One segment of a presentation "From idea to implementation: BIBFRAME becomes reality", Charleston, 2022
The implementation of BIBFRAME in active cataloguing workflows and linked data exchange environments is live and it’s evolving across several paths that are often intertwined. This complex bibliographic ecosystem consists of many experiences that the speakers will present highlighting their value both as autonomous endeavours, as well as from the perspective of interaction and options for mutual integration.
The Library of Congress, with the BIBFRAME original cataloguing editor, Marva, will report about developments and achievements for bringing BIBFRAME into practice in a very large library environment with many cataloguing workflows for diverse types of resources, encompassing the use of and adjustments to the BIBFRAME ontology and its modelling.
On the topic of original and copy cataloguing in linked data, Stanford and Cornell Universities are working to achieve a dynamic form of cataloguing through the implementation of Sinopia linked data editor and enrichment tools such as the Questioning Authority that queries authoritative sources to support linked data authorities.
Regarding the impact of linked data processes on the user experience, the University of Pennsylvania has contributed a study describing the functionalities and scenarios which the Share-VDE 2.0 entity discovery system https://www.svde.org/ addresses, and the ways in which user feedback is supporting the evolution of linked data discovery.
Share-VDE (SVDE) is an international library-driven initiative which brings together the bibliographic catalogues and authority files of a community of libraries in an innovative entity discovery environment based on linked data. A path towards the integration of SVDE with the local library services at the University of Pennsylvania and with the Sinopia environment is ongoing. Being a linked open data node, SVDE supports various levels of interoperability and also provides additional tools like the J.Cricket entity editor based on BIBFRAME that opens up new forms of cooperation among libraries to manage and maintain linked data entities.
OCFL: A Shared Approach to Preservation PersistenceSimeon Warner
A lightning talk at the CNI Fall Forum 2022: The Oxford Common File Layout (OCFL) is an application-independent method for storing and versioning content for digital preservation. Version 1.1 was released in October 2022, including backwards compatible corrections and clarifications based on implementation experience and community feedback. The session will recap goals, summarize changes in v1.1, and survey current implementations.
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
Introduction to the International Image Interoperability Framework (IIIF)Simeon Warner
Introduction to the International Image Interoperability Framework (IIIF), Tutorial at Library Network Days, National Library of Finland, Helsinki, 2017-10-26
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFSimeon Warner
Identifiers, including ORCID, ISNI, LC NACO and VIAF, are playing an increasing role in library authority work. Well describe changes to cataloging practices to leverage identifiers. We'll then tell a short story of the how and why of ORCID identifiers for researchers, and relationships with other person identifiers. Finally, we'll discuss the use of identifiers as part of moves toward linked data cataloging being explored in Linked Data for Libraries work (in the LD4L Labs and LD4P projects).
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
1. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Synchronize your
resources with
ResourceSync
Simeon Warner
(Cornell University Library)
1
2. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 2
Team sport
3. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 3
more, still more missing
JISC
Richard Jones
Graham Klyne
Stuart Lewis
OCLC
Jeff Young
LOCKSS
David Rosenthal
RedHat
Christian Sadilek
Ex Libris Inc.
Shlomo Sanders
Library of Congress
Kevin Ford
4. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 4
Alfred P. Sloan
Foundation
5. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Synchronize
• keep “in sync” (colloq.)
• Following changes over time
and
• Keeping copies on different systems the same
• Tackle only the unidirectional problem:
From a Source, to a Destination
5
6. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Resources
aka Web Resources:
have URI, HTTP GET representation(s)
Many / Few
Big / Small
Fast / Slow
6
8. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Scholarly repositories
• Replicate data/articles for mirroring, reuse,
indexing, ...
• OAI-PMH for metadata
• Many custom solutions
for full content
8
9. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Linked data
Fundamentally distributed but local copy often
required. Either:
1. cache
2. sync local copy...
• Many custom solutions
for local copy
9
Last.FM
MusicBrainz
GeoNames
DBpedia
others...
BBC
10. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Didn’t you sell us OAI-
PMH?
Or... will ResourceSync replace OAI-PMH?
Proven metadata transfer protocol
Widely adopted in our community
X Predates REST, not “of the web”
X Not adopted for content transfer
Can replace, likely coexistence
10
12. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
1. Baseline sync
Initial load, copy, or catch-up from source
• need list of all resources
• optional packaged content
Want to
• avoid out-of-band setup & customization
12
13. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
2. Incremental sync
Keep up-to-date with changes at a source
• need information about changes
• optional packaged content
• minimal primitives: create/update/delete
Want
• allow catch-up after destination offline
• lower latency and/or greater efficiency than
repeated baseline sync
13
14. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
3. Audit
Destination should be able to verify whether it is
synchronized with a source
• need list of all resources + fixity info
Want
• lower latency and/or greater efficiency than
baseline sync
• note: subject to some latency
14
17. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 17
Minor?
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”
xmlns:rs="http://www.openarchives.org/rs/terms/”>
<rs:ln …/>
<rs:md …/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:ln …/>
<rs:md …/>
</url>
<url>
…
</url>
</urlset>
18. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Baseline sync & Google
Most basic capability is Resource List:
• Snapshot of state of resources
• URI, datestamp + optional extra fixity info
• Destination does GET on each resource
ResourceSync Baseline sync & Audit
Google/Bing/Yahoo!/etc. harvest
18
20. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Extensible
Extensible use of Link Relations from Atom
• Spec describes use for mirrors, patches,
historical, provenance, conneg...
• Use <rs:ln rel=“your-relation-here” .../>
Extensible attributes for fixity etc.
• Includes lastmod, fixity, length, type...
Extensible framework -> new capabilities
20
21. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Push = Lower latency
Pull
• easy setup, no trust required
Push Changes
• lower latency, better scaling
• same descriptions as pull
• standard transports (XMPP, Websockets...)
• can push discovery info to trigger pull
21
22. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Timeline
January 2013
June 2013
July 2013
Fall 2013
• Tools and libraries
being developed to
ease implementation
First beta
Version 0.9
Update and push spec
NISO standardization
• Tutorials at major
conferences (OAI8,
OR, JCDL,...)
22
0:00I will attempt, in the next 7minutes, to motive creation of the ResourceSync framework and explain what is means in a slightly less circular manner than the title. But first, I cannot claim that this is all my work...
0:17Core team comprises
0:34Technical committee
0:51and all this would not have been possible without funding for in-person meetings and some core team timeprimary funding from SloanUK participation funding from Jisc
1:08Let me pull apart the two words of the title and framework name
1:25ResourceSync is about Web Resources, things on the web with a URI identifier that can be derefenced to get one or more representations- the project is making and observation and a statement that repositories should exist really on the web- from 10s on a small website to 10s of millions in big repositories- large data resources, publications, linked data- changes multiple times per second to infrequent changes of archival records
1:42So far I’ve told you that a whole bunch of people are using up some generous funding to think about how to better synchronize web resources between systems. Why would we do this? What is the need? Going to give just two example use cases. More in Dlib article about a year ago.
1:59Many contexts when copies of resources in scholarly repositories are necessary. From one repo I’m involved with, arXiv.org, mirroring, copy for index, copy for researchCurrently either ad-hoc approaches or resort to the very blunt instrument of web crawling
2:16Ironic perhaps that while linked data is fundamentally distributed, many applications require local copies. Ad-hoc approaches to bulk copy
2:33OAI-PMH was introduced over 12 years ago (before the first JCDL, before OR was even imagined)
2:50Know why we need this new protocol, what should it do? Took a BIG step back to look at the fundamentals of the synchronization problem. We came up with the following 3 operations.
3:07Use Resource List or a Resource Dump which includes a Resource List as a manifest and the actual content
3:24
3:41
3:58So, we have three operations, how do these get implemented? What is the lowest barrier, most widely compatible, most performant, and most future proof way?Preferably inventing as little new stuff as possible.
4:15Do everything with sitemaps. Considered many options but sitemaps won because good match, wide adoption, simple, extensible. Minor extensions required.
4:32Yes, really minor. Two extra elements and attributes borrowed from several other specifications, notably Atom Link Extensions. In January the Sitemaps.org folks modified their schema to all the top level elements and this all ResourceSync documents are schema-valid sitemap (or sitemap index documents).
4:49Really cool thing about using sitemaps is that by implementing the most basic capability, the Resource List, you are also producing a sitemap that can be used by all the major search engines
5:06
5:23It is just possible that we haven’t thought of everything or got everything perfect. Three areas of extensibility: expression of relations between resources, expresssion of fixity and other information about resources, and at the framework level new capabilities can be added