RDA & the New World of Metadata


Published on

Slides from the keynote of the 2014 AMIGOS RDA conference, presented virtually on February 20, 2014. The conference title "Is RDA on Your RaDAr?"

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The real revolution in RDA is in the vocabularies, not the rules
  • MARCXML is an XML schema, and designed in part as a bridge to MODS. ALL XML is inherently ‘top-down’.
  • Jon Phipps: “RDF still can be thought of as a record and managed as a record, but an RDF record is a bucket without a bottom, a box with no walls”. RDF is inherently ‘bottom-up’.
  • The old MARC/ILS model of data management is totally inadequate to the management of statement-based data.
  • The FamiliarRDVocab page is in the process of changing. The Element sets are now deprecated, but the value vocabularies remain as is.
  • Deprecation allows redirect and improves ability to provide historical context. If data is deleted there is no history.
  • This website will change—should be considered ‘under construction.’ Notice that there are no separate sets for Roles and other WEMI relationships—these have been integrated with the WEMI sets.
  • Many of the important newer standards (particularly the W3C standards) rely on Github.
  • The problem is the difference between xml top-down and ref bottom-up. In rdf you only know what type of thing is being described (the subset of all 'things' it belongs to) because of the 'domain' of the properties describing it.In xml you have to decide up front what the type of thing is and then you describe it with the properties that you are allowed to use to describe that type of thing by the schema.The unconstrained properties, because they don't have a specific domain, don't make any statement about the type of thing they are describing, so you could use them to describe a mongoose as well as a book about a mongoose
  • A ‘curie’ (often expressed in all CAPS) is a COMPACT URI, an abbreviated URI expressed in CURIE syntax, and may be considered a datatype. (Defined by the W3C and described here: http://en.wikipedia.org/wiki/CURIE)The canonical form specifies a unique representation for every element, which will not change (ergo the numeric basis); the lexical form MAY change, if the label of the element changes. Both old and new lexical forms will be redirected to the canonical form.
  • Note that links are to parent property, domain, and range. In this example the domain (Work) and range (Corporate Body) constrain the use of the property in data and allow inferencing. The parent property is also constrained with a domain and range.
  • Note that in this view, there is the inverse, an additional ‘subpropertyOf” (links to unconstrained property) and ‘sameAs’ (links to additional URI)
  • Note that the domain (Work) and range (Corporate body) for the previous example are defined in these classes.
  • This is the unconstrained version of the property in the previous example, not that there is no domain and range. The parent in this case is another unconstrained property.
  • The relationship to the standard RDA properties allows RDA to ‘grow’ in a decentralized manner. Other communities are free to use these related properties (which will be in different namespaces and maintained by different organizations).
  • Libraries have many roles in the new data economy.
  • Crosswalking as a traditional practice makes it difficult for community members to effectively respond to decisions made behind the curtain or to contribute to better maps.
  • Teal group in upper right is Dublin CoreLight blue groups are RDA, some constrained (e, m) some unconstrained (u)Darker blue: a variety of schemas: BIBO, UniMARC, MARC 21
  • Library administrators are reading the many studies that indicate that users are NOT USING CATALOGS, in any form (including mobile-enabled ones), and this is, in effect, the writing on the wall.
  • RDA & the New World of Metadata

    1. 1. RDA & The New World of Metadata Diane I. Hillmann AMIGOS Conference, Feb. 20, 2014 "Is RDA on Your RaDAr?"
    2. 2. What Is RDA? • You’re already familiar with the Toolkit and the ‘rules’ – You’ve heard about the problems with using the RDA instruction with MARC (maybe even tried it) – RDA as a standard includes both the instruction and the vocabularies • RDA as data has a very different model than MARC • MARC was originally designed to print cards • RDA was built around FRBR 2/20/14 AMIGOS RDA Conference 2014 2
    3. 3. This Transition is Hard • The transition is not just from AACR2 to RDA • It’s also about: – Different views of the world – Different data models – Different distribution strategies • Linked data is part of this transition • Re-thinking our basic assumptions is critical 2/20/14 AMIGOS RDA Conference 2014 3
    4. 4. Model of ‘the World’ /XML • XML (and MARC21) assume a 'closed' world (domain), usually defined by a schema: – "We know all of the data describing this resource. The single description must be a valid document according to our schema. The data must be valid.” – XML's document model provides a neat equivalence to a metadata 'record’ (and most of us are fairly comfortable with it) 2/20/14 AMIGOS RDA Conference 2014 4
    5. 5. Model of ‘the World’ /RDF • RDF assumes an 'open' world: – "There's an infinite amount of unknown data describing this resource yet to be discovered. It will come from an infinite number of providers. There will be an infinite number of equally valid descriptions. Those descriptions must be consistent." – RDF's statement-oriented data model has no notion of 'record’ (rather, statements can be aggregated for a fuller description of a resource) 2/20/14 AMIGOS RDA Conference 2014 5
    6. 6. Linked Data is Inherently Chaotic • Requires creating and aggregating data in a broader context – There is no one ‘correct’ record to be made from this data, no objective ‘truth’ • This approach is different from the cataloging tradition – BUT, the focus on vocabularies is familiar • Linked data relies on the RDF model 2/20/14 AMIGOS RDA Conference 2014 6
    7. 7. The New Data Management • Managing data at the statement level rather than record level • Emphasis on evaluation coming in and provenance going out • Shift in human effort from creating standard cataloging records to knowledgeable human intervention in machine-based processes • Extensive use of data created outside libraries • Intelligent re-use of our legacy data and redistribution of our data more widely 2/20/14 AMIGOS RDA Conference 2014 7
    8. 8. What’s Still New About RDA? New version of the vocabularies brings together all we’ve learned since 2008 New ‘branded’ namespace Commitment to synchronize Toolkit & Vocabs Optimized for the Semantic Web and linked data 2/20/14 AMIGOS RDA Conference 2014 8
    9. 9. Understanding The New RDA • What’s *still+ new about RDA? • How should we look at its current progress? • Is RDA what we need? Should we ‘wait’ for something that might be ‘better’? • How should we understand the interplay between RDA and technological changes going on concurrently? 2/20/14 AMIGOS RDA Conference 2014 9
    10. 10. Breaking News • The RDA Vocabularies now have an updated version, and a new namespace – Old RDA Vocabularies: RDVocab.info – New RDA Vocabularies: RDARegistry.info • Old element vocabularies never published; All new vocabularies are published • Value vocabularies remaining in the older namespace (for now) 2/20/14 AMIGOS RDA Conference 2014 10
    11. 11. 2/20/14 AMIGOS RDA Conference 2014 11
    12. 12. Ringing Changes • Why deprecate? – The RDVocab elements were never formally published, but they have been used – Deprecation is better than deletion in this space – Element sets are reorganized to simplify element names and better integrate relationships with other FRBR elements • else is new? – Verbalized element names (‘has’, ‘is’, etc.) – Explicit reciprocals – Different URI strategy 2/20/14 AMIGOS RDA Conference 2014 12
    13. 13. http://rdaregistry.info 2/20/14 AMIGOS RDA Conference 2014 13
    14. 14. Why GitHub? • GitHub is a widely used repository with tools that enable services and documentation to be created and managed more easily – In some cases, human readable and technical versions can be created automatically – Enables detailed version information to be managed by machines and viewed by users – Supports easily generated output for use by other systems and users 2/20/14 AMIGOS RDA Conference 2014 14
    15. 15. Constrained & Unconstrained Properties: What’s the Difference? • The FRBR ‘bounded’ properties should be seen as the official JSC-defined RDA basic Application Profile for libraries • Extensions and mapping should be built from the unconstrained properties – Unconstrained vocabularies necessary for use in domains where FRBR not assumed or inappropriate – Mapping from vocabularies not using the FRBR model directly to ones that do (and back) creates serious problems for the ‘Web of Data’ • Differences make vocabularies able to express library knowledge in the context of the Semantic Web 2/20/14 AMIGOS RDA Conference 2014 15
    16. 16. What’s Important About the New RDA? URI strategy optimized for multiple languages and both human and machine use Sets reorganized to bring the relationships into the element sets RDA enables data to be managed for both the closed world of library and local data and the open world of linked data [Image by Reed Sturtevant, FLICKR] 2/20/14 AMIGOS RDA Conference 2014 16
    17. 17. Big Challenges/Big Ideas • Records are still important but not as we’ve used them in the past – We might want to think about records as the instantiation of a point of view – News: traditional library data has a point of view • MARC required consensus because of limitations built into the technology – For any data in statements destined for the Semantic Web, we need provenance, so we know “Who sez?” 2/20/14 AMIGOS RDA Conference 2014 17
    18. 18. The OMR and Github Will Work in Tandem RDA Vocabularies on Github Open Metadata Registry * Easier updating of documentation and technical versions • More flexible management options (Vocabularies can be managed locally using a variety of tools) • Distributed version control • Issues management * Built in User Interface designed for humans • Detailed history info on all transactions • Downloadable options * Change feed * Limited notifications * Limited options for additional documentation 2/20/14 AMIGOS RDA Conference 2014 18
    19. 19. Download and view options 2/20/14 AMIGOS RDA Conference 2014 19
    20. 20. Available Technical Formats 2/20/14 AMIGOS RDA Conference 2014 20
    21. 21. 2/20/14 AMIGOS RDA Conference 2014 21
    22. 22. 2/20/14 AMIGOS RDA Conference 2014 22
    23. 23. Domains and Ranges are Classes 2/20/14 AMIGOS RDA Conference 2014 23
    24. 24. 2/20/14 AMIGOS RDA Conference 2014 24
    25. 25. Vocabulary Extension • The inclusion of unconstrained properties provides a path for extension of RDA into specialized library communities, nonlibrary communities and to better support local needs – Other communities may have a different notion of how FRBR ‘aggregates’ (For example, a colorized version of a film may be viewed as a separate work) – Non-libraries may not wish to use FRBR at all – Local users may have additional, domain-specific properties to add, that could benefit from a relationship to the RDA properties 2/20/14 AMIGOS RDA Conference 2014 25
    26. 26. rdau:isAdaptedAs rdau:isAdaptedAsARadioScript 2/20/14 AMIGOS RDA Conference 2014 26
    27. 27. rdau:isAdaptedAs rdau:isAdaptedAsARadioScript KidLit:isAdaptedAsAPictureBook Extension using Unconstrained Properties 2/20/14 AMIGOS RDA Conference 2014 27
    28. 28. rdau:isAdaptedAs rdau:isAdaptedAsARadioScript KidLit:isAdaptedAsAPictureBook KidLit:isAdaptedAsAChapterBook Extension using Unconstrained Properties 2/20/14 AMIGOS RDA Conference 2014 28
    29. 29. What’s our Distribution Model? We don’t know what you want, so choose! 2/20/14 We know more about what you want than you do. Here it is! AMIGOS RDA Conference 2014 29
    30. 30. Libraries as Data Publishers & Consumers • Data from library ‘publishers’ should look like a supermarket—lots of choices, with decisions made by consumers – Right now we seem to be operating as Soviet bakeries – This is not what open linked data is supposed to be doing for us • "Be conservative in what you send, liberal in what you accept”—Robustness Principle 2/20/14 AMIGOS RDA Conference 2014 30
    31. 31. Where You Start Affects Where You End Up • Simple metadata is more useful as output than input – The ‘long tail’ of MARC’s lesser used properties was built up over decades and shouldn’t be discarded – It’s easier to dumb down than smarten up (and not as lossy) • Dublin Core and MARC are examples of starting simple and trying to add on – MARC 21 went well beyond AACR2 in its scope – Dublin Core successful as a common mapping (most schemas map to DC) but rarely sufficient by itself 2/20/14 AMIGOS RDA Conference 2014 31
    32. 32. Libraries as Data Publishers • If we want people outside libraries to use our data, we need to offer them choices • This strategy is supported by loss-less mapping of all of our legacy data – Not a pre-determined selection – Filtering best accomplished by data consumers, who know what they need • This requires a new strategy for managing the data – RDA, as a rich metadata model based on sound Library experience, is an excellent basis for this strategy 2/20/14 AMIGOS RDA Conference 2014 32
    33. 33. Libraries as Data Consumers • As aggregators of relevant metadata content – Developing methods to gather and redistribute data without necessarily re-creating OCLC – Modeling and documenting best practices in metadata creation, improvement and exposure – Application profiles important in this effort • As developers of vocabularies, exposing a variety of bibliographic relationships • As innovators (particularly as part of the cultural metadata community) using social networks to enhance bibliographic description 2/20/14 AMIGOS RDA Conference 2014 33
    34. 34. Mapping Legacy Data for Re-distribution • If we want data consumers to value our data, we should map it all – We can distribute limited ‘flavors’ as well, as we gain experience and feedback • Current crosswalking strategies are based on: – One-time, inflexible, programmatic methods that effectively hide the process from consumers – Assumptions that data must be improved at the time it is crosswalked, or never 2/20/14 AMIGOS RDA Conference 2014 34
    35. 35. What We Mean by ‘Mapping’ dct:format dc:format dct:extent unim:U215__a m21:M300 rdam:extent rdau:duration rdau:extent m21:M306__a rdam:extentOfText rdau:extentOfText rdae:duration isbd:”has extent” bibo:numVolumes 2/20/14 bibo:numPages AMIGOS RDA Conference 2014 unim:U127__a 35
    36. 36. Will This Shift Cost Too Much? • We need to support efforts to invest in more distributed innovation and focused collaboration • It’s the human effort that costs us – Cost of traditional cataloging is far too high, for increasingly dubious value • Our current investments have reached the end of their usefulness – All the possible efficiencies for traditional cataloging have already been accomplished • Waiting for leadership from the big players costs valuable time with no guarantees of results 2/20/14 AMIGOS RDA Conference 2014 36
    37. 37. The Bottom Line • Our big investment is (and has always been) in our data, not our systems • Over many changes in format of materials, we’ve always struggled to keep our focus on the data content that endures, regardless of presentation format • We are in a great position to have influence on how the future develops, but we can’t be afraid to change, or afraid to fail 2/20/14 AMIGOS RDA Conference 2014 37
    38. 38. Thank you! Questions? Contact info: metadata.maven@ gmail.com Metadata Matters: http://managemetadata.co m/blog 2/20/14 AMIGOS RDA Conference 2014 38