This document discusses issues with the current MARC metadata standard and proposes moving to linked data standards like BIBFRAME and RDF. It outlines problems with MARC such as being difficult for computers to process, containing free-text fields, and lacking unique identifiers. It then discusses efforts to model library data using standards like BIBFRAME, SKOS, and RDF versions of RDA to address these problems and make data more interoperable on the web. The document argues that libraries need to assign URIs to entities and publish and link data to transition to a linked open data approach.
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Slides from a talk I gave at Perspectives Workshop on Semantic Web, http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=09271 ... Dagstuhl, Germany 2009-06-29. Title was from Jim Hender!
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
I CAN DO IT ALL BY MYSELF: : Exploring new roles for libraries and mediating ...Bohyun Kim
Presentation given at the American Library Association Annual Conference, Anaheim, CA. June 23, 2012.
Speaker: Bohyun Kim, Digital Access Librarian, Florida International University
Speaker: Jason Clark, Head of Digital Access and Web Services, Montana State University Libraries
Speaker: Patrick T. Colegrove, Head, DeLaMare Science & Engineering Library, University of Nevada, Reno
More program details: http://ala12.scheduler.ala.org/m/node/806
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Presentation delivered at the Linked Ancient World Data Institute, Drew University, 30 May 2013.
Copyright 2013 New York University.
This work is licensed under a Creative Commons Attribution 4.0 International License.
http://creativecommons.org/licenses/by/4.0/deed.en_US
Funding for the preparation and presentation of this presentation and the workshop at which it was presented was provided by the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuousIntegration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of contentmining (TDM)
Slides from a talk I gave at Perspectives Workshop on Semantic Web, http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=09271 ... Dagstuhl, Germany 2009-06-29. Title was from Jim Hender!
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
I CAN DO IT ALL BY MYSELF: : Exploring new roles for libraries and mediating ...Bohyun Kim
Presentation given at the American Library Association Annual Conference, Anaheim, CA. June 23, 2012.
Speaker: Bohyun Kim, Digital Access Librarian, Florida International University
Speaker: Jason Clark, Head of Digital Access and Web Services, Montana State University Libraries
Speaker: Patrick T. Colegrove, Head, DeLaMare Science & Engineering Library, University of Nevada, Reno
More program details: http://ala12.scheduler.ala.org/m/node/806
An overview of ContentMining for JISC (the infrastructure provider of UK academia). Examples, details leading to hands-on exercise (http://contentmine.org/workflow
Presentation delivered at the Linked Ancient World Data Institute, Drew University, 30 May 2013.
Copyright 2013 New York University.
This work is licensed under a Creative Commons Attribution 4.0 International License.
http://creativecommons.org/licenses/by/4.0/deed.en_US
Funding for the preparation and presentation of this presentation and the workshop at which it was presented was provided by the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.
A very basic overview of RDA, updated. This presentation is appropriate for all library staff including those outside of cataloging, library science students, and others.
Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Presentation to the Metadata Developer Network Workshop 2014 (MDN Workshop 2014), 4th of June 2014, Geneva, Switzerland.
Also available on http://www.slideshare.net/dri_ireland
Abstract: In this presentation, we report on our experience using the EBU Core OWL ontology for annotating audiovisual archival content stored in an EBU Core XML Schema compliant tool used by RTÉ, the national public service broadcaster of the Republic of Ireland. We first describe the goal of the project and elaborate on the role of Semantic Web ontologies and technologies. We continue with a report on some of the challenges while using the EBU Core OWL ontology. We finally formulate several recommendations on the conceptual model and the ontology development method.
Linked Data: The Real Web 2.0 (from 2008)Uche Ogbuji
"Linking Open Data (LOD) is a community initiative moving the Web from the idea of separated documents to a wide information space of data. The key principles of LOD are that it is simple, readily adaptable by Web developers, and complements many other popular Web trends. Linked, open data is the real substance of Web 2.0, and not flashy AJAX effects. Learn how to make your data more widely used by making its components easier to discover, more valuable, and easier for people to reuse—in ways you might not anticipate."
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
Mining data requires a deep investment in people and time. How can you be sure you’re building the right models? What tools help you connect with the customer’s needs? With this hands-on presentation, you’ll learn a flexible toolset and methodology for building effective analytics applications. Agile Data (the book) shows you how to create an environment for exploring data, using lightweight tools such as Python, Apache Pig, and the D3.js (Data-Driven Documents) JavaScript library. You’ll learn an iterative approach that allows you to quickly change the kind of analysis you’re doing, as you discover what the data is telling you. All the example code in this book is available as working web applications. We will cover how to: * Build an application to mine your own email inbox * Use different data structures and algorithms to extract multiple features from a single dataset, and learn how different perspectives can yield insight * Rapidly boot your applications as simple front-ends to a document store * Add features driven by descriptive and inferential statistics, machine learning, and data visualization * Gather usage data and talk to real users to help guide your data-driven exploration
Slides from a short presentation on xAPI Vocabulary and how it can be applied in Learning Analytics, as given at the LAK 2016 JISC Learning Analytics Hackathon.
Quick intro to RDA for my staff includes basic overview of how RDA differs from AACR2, MARC, FRBR, and the Semantic Web. Includes examples. by robin fay for UGA Libraries/ DBM, georgiawebgurl@gmail.com
Similar to MARC and BIBFRAME; Linking libraries and archives (20)
Lecture for LIS 644 "Digital Trends, Tools, and Debates." Not my strong point, so I won't swear there are no errors. If you reuse, please respect the CC-BY-NC-SA license on the photo.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
2. We built MARC when
stood between us and patron.
Photo: Deborah Fitchett, “Catalogue cards” http://www.flickr.com/photos/deborahfitchett/2970373235/ CC-BY
3. We built MARC when
the world was clearly bounded.
Photo: NASA Goddard Photo and Video, “NASA Blue Marble” http://www.flickr.com/photos/gsfc/4392965590/ CC-BY
4. These days,
stands between us and patron.
Photo: Declan Jewell, “My Desk” http://www.flickr.com/photos/declanjewell/2743737312 CC-BY
5. These days,
world’s looking a bit fractal!
Photo: NASA Goddard Photo and Video, “Still centered over the Atlantic” http://www.flickr.com/photos/gsfc/4409800816/ CC-BY
6. From what you read...
•What difficulties are programmers
finding in the MARC/AACR2/ISBD(G)
way of doing things?
•Especially keep in mind what the programmers are trying to accomplish!
•What solutions are they recommending?
•(yes, I know a lot of what I had you read is pure kvetching)
•How do humans differ from computers?
What does that mean for cataloging?
7. Problems with MARC/AACR2/ISBD
(if you’re a networked computer)
•Globally-unique identifiers for what’s in our
bibliographic universe?
•And what IS in our bibliographic universe, anyway?
•Interoperability? Who speaks MARC outside
libraries?
•This is a problem on both ends of the pipeline, these days!
•FREE TEXT (for anything not transcribed) MUST DIE.
•It is the LEAST consistent, internationalizable, interoperable way to record
information on a computer.
•Put another way: we haven’t controlled all the cataloging practices we usefully could.
http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/
8. Speaking of free text...
•What kinds of problems did you run into as you
were working with MODS?
•Case
•Enumerations
•Remembering end-tags!
•What does that suggest to you about humans,
text, and accuracy?
•Seriously, people, PARSE AND VALIDATE YOUR XML. Just do it.
•I have to, too, and I’ve been XMLing for a decade and a half, almost.
•Linked-data rallying cry: “Things not strings!”
•And now you begin to understand why.
•(Known psychological phenomenon: humans RECOGNIZE text far more accurately than
they can PRODUCE or REPRODUCE it. Wouldn’t it be nice if our cataloging and
metadata tools respected that?)
9. Problems
•The MARC format is old, obscure, and difficult
to parse.
•There are programming libraries in many popular languages that read MARC. If you
ever have to work with MARC records programmatically, use them!
•Despite those, MARC’s obscurity isolates library data from other information
communities, especially on the Web.
•Free-text fields/parts in MARC are uncontrolled,
and often inconsistent in practice. Fixed fields
don’t contain all the info they perhaps should.
•Use of strings (text) rather than identifiers.
•Review: What makes a good identifier for a computer?
•Errors. Errors! ERRORS!!!!!
10. More problems
•Some MARC tags are ambiguous.
•What kind of URL is in an 856?
•Sometimes the same information (from a
computer’s point of view) is scattered across
a bunch of MARC fields.
•Some MARC tags relate to other MARC tags,
but are not explicitly correlated in any way.
•Very hard to figure out a work’s carrier
(content type).
•Want to give your patrons a list of DVDs? You are OUT OF LUCK.
11. Fundamental problems
•“Machine readable” vs. “machine actionable.”
•This problem breaks down into, um, where data get broken down. With MARC, that
happens after record creation, and isn’t reliable. With linked data, it happens up-front.
•Every minute a programmer spends coping with
non-computer-friendly practices is a minute NOT
spent enhancing library experiences for patrons.
•Every minute a cataloger spends creating
computer-unfriendly data is a minute wasted.
•We are cataloging for computers now, not just
catalog-card-reading humans.
•Put another way, computers are today’s intermediary between catalogers and patrons,
as catalog cards once were. We were card-friendly. We must become computer-friendly.
12. Have you noticed?
•Programmer: “Trying to solve Patron
Problem X. MARC causes Complication Y.”
•Cataloger: “LALALALA MARC LALALA
TRADITION LALALALALA I CAN’T
HEEEEEEEEEEAR YOOOOOOOOOU!”
•wait, where did the patron and her problem go?
•Seriously, librarianship? Seriously? We have
to stop this.
•If programmers demonstrate it’s a problem, IT IS A PROBLEM. I don’t care about
the tradition or the standard that caused the problem. Solve the problem!
13. The “open” in LOD
•Who owns metadata?
•Who thinks they own it?
•If metadata are both ownable and owned,
what does that mean for linked data?
•Conversely, can linked data provide a lever against owned metadata?
•How is OCLC treating this issue?
•How is DPLA treating it?
14. SPARQL
•With XML data, you generally just dump
it on the web and let people figure out
what (if anything) to do with it.
•This means a lot of translator-writing and bandwidth cost.
•(There’s an XML query language called XQuery, but nobody uses it.)
•You can do this with RDF too (and some do), but it’s not really ideal.
•SPARQL: query language for RDF.
•Looks a LOT like SQL, intentionally so. The hardest thing to get to grips
with is namespace declarations, and that’s not really all that hard.
•“SPARQL endpoint:” URL for a given set of RDF data that you can send
queries to and get answers from.
15. How?
•If linked data is where the world is moving...
•If data need to be open to be linked...
•If libraries and archives are sitting on a mass
of unlinked and possibly unlinkable data...
•How do we get there from here?
•And where’s “there” anyway?
16. Linked Data principles
http://www.w3.org/DesignIssues/LinkedData.html
•use URIs as names for things
•use HTTP URIs (aka URLs) so that people can
look up those things
•(this is one of Linked Data’s concessions to pragmatism, compared to the
original SemWebbers)
•when someone looks up a URI, provide
useful information, using the standards
•include links to other URIs so that they
can discover more things
18. The road ahead
1.Model our universe of things in a linked-data-
friendly fashion.
•This is what Coyle and Hillmann, BIBFRAME, SKOS, various national-library
efforts, VIAF, Dublin Core, EAC-CPF, and to some extent RDA are working on.
•(“Things” != “just things.” For linked data, people, places, and subjects are also
things.)
2.Atomize our existing data as best we can.
3.Assign URL identifiers to everything in sight.
4.Publish, link out, and link up!
5.(Squelch OCLC’s ownership claims. We can’t
have that if we want LOD or even just LD.)
19. See this process in action
•http://www.slideshare.net/philjohn/
linked-library-data-in-the-wild-8593328
21. BIBFRAME
•LoC got tired of all the waffling about how
to replace MARC.
•10/31/2011: “We’re going to just DO THIS. Join in or don’t.”
•NISO (which owns MARC), ILS vendors, catalogers: *have kittens*
•LoC: “Cope.” (You can tell where my sympathies lie, yes?)
•Not the first or the only; the British Library
has been working on its own LD
infrastructure for a couple years now.
•Spain’s national library has an interesting RDFized FRBRish
implementation.
24. SKOS
•Simple Knowledge Organization System
•from our friends at the W3C; builds on prior work
•http://www.w3.org/2004/02/skos/
•Representation of common controlled-
vocabulary structures in RDF syntax
•Review: What does a thesaurus entry
look like?
25. Things in SKOS
•Concepts (we know them as “terms”)
•And “Concept Schemes,” which represent CVs as we’re used to thinking of them
•Labels
•Review: why are these distinct from Concepts?
•Relationships among concepts
•Broader/narrower
•Associative (“see also”)
•Equivalencies and near-equivalencies (here there be dragons)
•Notes (of various kinds)
•Really pretty straightforward, for RDF!
•And has made inroads outside libraries for that very reason
26. EAD
•Attempt to model EAD “things”
•http://archiveshub.ac.uk/locah/2010/09/28/model-a-first-cut/
•Things
•Unit of Description
•Archival Finding Aid
•Repository (an Agent)
•Origination (an Agent)
•“Things” (access points, index terms). Review: What are archival access points?
•Best I can tell, this work hasn’t been taken up yet.
27. LOCAH Project, “The ‘things’ in EAD” http://archiveshub.ac.uk/locah/2010/09/28/model-a-first-cut/
28. RDFizing RDA
•What does RDA actually talk about?
•FRBR model: Group 1, 2, and 3 entities
•(though Group 1 is still kind of squidgy, really, and some application
developers are questioning its usefulness)
•DCMI model (because life can NEVER be simple)
•Relationships among entities
•What do we want to say about them?
•Are there existing ways to say these things that are good enough for our
purposes? Can we reuse them, or at least map to them?
•When there aren’t, how do we say what we need to in ways that are most
useful for the rest of the world?
•Assigning URIs to it all
29. Model friction
•FRBR: entity-relationship model
•... like relational databases, which is nice
•not entirely RDFish, which is not quite so nice and is causing head-scratching
•But head-scratching is normal in this space! Modeling is hard!
•FRBR does give us some abstractions to
model and assign URIs to.
•And IFLA was supposed to do that... but they haven’t.
•So the RDA folks have provisionally done it: FRBRoo.
•Should IFLA get back in the game, formal equivalences will be defined and
published between FRBRoo and whatever IFLA comes up with.
•FRBR isn’t perfect. (Gasp. I know, right?)
•So sticking strictly to FRBR as we model (relationships particularly) causes
problems for music and multimedia catalogers, among others.
30. RDA Vocabularies
•Hillmann, Dunsire, et al. try to work through
RDFizing RDA.
•It’s crazy complicated. And weird. So I’m not
even trying to explain it here.
•If you REALLY WANT TO KNOW: http://www.dlib.org/dlib/january10/
hillmann/01hillmann.html
•OR Karen Coyle’s Library Technology Report “RDA Vocabularies for a
Twenty-First-Century Data Environment”
•Would a cataloger need to know this?
•Probably not. It’s plumbing, really.
32. We’ve seen some already...
•URLized Dublin Core concepts
•MODS accepting URLs
•VIAF
•id.loc.gov
•One more I won’t talk about today:
Dewey Decimal (dewey.info)
35. EAC-CPF and SNAC
•EAC-CPF: Encoded Archival Context—
Corporations, Persons, and Families
•Archival authority standard, from the kindly folks who brought us EAD!
•Not linked-data-ized. Yet.
•SNAC: The Social Networks and Archival
Context Project
•Using EAC-CPF to link people across archival collections
•Aha! Starting to sound more linked-data! (It’s not RDFfy, yet, but should be
relatively easy to RDFize later.)
•http://socialarchive.iath.virginia.edu/
•Moral: You don’t have to use or even know
RDF to start getting ready for linked data!
36. ORCID and ISNI
•Open Researcher and Contributor ID
•Authority control for scholars who don’t write books
•http://orcid.org/
•International Standard Name Identifier
•Thinks of itself as a superset of VIAF, ORCID, etc.
•Remains to be seen whether they can pull this off, of course...
•http://isni.org/
•Review: what does linked data do about
two URLs identifying the same thing?
38. Europeana
•Review: what is Europeana?
•Built a “Europeana Semantic Elements” set
•Dublin Core Application Profile
•Kludgy, as anything with Dublin Core inevitably is
•Moving to “Europeana Data Model”
•Publishing linked open data at data.europeana.eu
•For more: http://www.niso.org/publications/isq/
2012/v24no2-3/isaac/
39. Missouri History Museum
•Needed to create a search portal to disparate
metadata silos
•Gee, where have we heard THAT before?
•Decided to crosswalk to RDF.
•Work in progress, but initial results encouraging.
•http://www.museumsandtheweb.com/mw2012/papers/
using_an_rdf_data_pipeline_to_implement_cross_
40. Had enough? Okay.
•Copyright 2013 by Dorothea Salo.
•This lecture and slide deck are licensed
under a Creative Commons Attribution
3.0 United States License.
•Several diagrams reproduced under fair
use.