Challenges for a new era


Published on

Presentation delivered in Oslo, Norway, February 8, 2013.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Some conflicts here
  • This way of handling roles should be usable in either XML or RDF (essential for RDF)
  • Smörgåsbord Norway it is called koldtbord
  • This practice makes it difficult for community members to effectively respond to decisions made behind the curtain or to contribute to better maps
  • (a la the open source OPAC replacements)
  • Challenges for a new era

    1. 1. Challenges for the New Era Diane I. HillmannMetadata Management Associates Oslo, February 8, 2013
    2. 2. Big Challenges/Big Ideas O Changing our thinking from records to statements O Will RDA help? O Where you start affects where you end up O Shifting our ways from ROI to Potlach O Recognizing that our human resources are limited O So how do we manage this data-that-isn‟t records?Oslo 2013 2/7/13 2
    3. 3. Statements and Records O Records are still important but not as we‟ve used them in the past O We might want to think about records as the instantiation of a point of view O News: traditional library data has a point of view O MARC required consensus because of limitations built into the technology O Now we need provenance, so we know “Who said?”Oslo 2013 2/7/13 3
    4. 4. Building RDVocab: Goals O Bridge the XML and RDF worlds O Ensure ability to map between RDA and other element sets O Provide a sound platform for extension of RDA Vocabularies into new and specialized domains O Consider methods for expressing AACR2 structures in technical ways to ease the pain of transition to RDAOslo 2013 2/7/13 4
    5. 5. RDVocab Structure, Simplified O RDA Properties declared in two separate hierarchies: O An „unconstrained‟ vocabulary, with no explicit relationship to FRBR entities O A subset of classes, properties and subproperties with FRBR entities as „domains‟ O Pros: retained usability in or out of libraries; better mapping to/from non- FRBR vocabularies O Cons: still seems too complex to many SemWeb implementers (many using BIBO)Oslo 2013 2/7/13 5
    6. 6. Why Unconstrained Properties? O The „bounded‟ properties should be seen as the official JSC-defined RDA Application Profile for libraries O What‟s still lacking is the addition of the necessary constraints: datatypes, cardinality, associated value vocabularies O Extensions and mapping should be built from the unconstrained properties O Unconstrained vocabularies necessary for use in domains where FRBR not assumed or inappropriate O Mapping from vocabularies not using the FRBR model directly to ones that do (and back) creates serious problems for the „Web of Data‟Oslo 2013 2/7/13 6
    7. 7. Property (Generalized, no FRBR Semantic relationship) Web Subproperty (with relationship to one FRBR entity) FRBR Entity The Simple Case: Library Applications One Property-- One FRBR EntityOslo 2013 2/7/13 7
    8. 8. Property (Generalized, no FRBR relationship) Semantic Web Subproperty (with relationship to one FRBR entity) FRBR Entity Subproperty (with relationship to one FRBR entity) FRBR Entity Library Applications The Not-So-Simple Case: One Property—more thanOslo 2013 One FRBR Entity 8 2/7/13
    9. 9. Roles: Attributes or Properties? O In 2005, the DC Usage Board worked with LC to build a formal representation of the MARC Relators so that these terms could be used with DC O This work provided a template for the registration of the role terms in RDA (in Appendix I) and, by extension, the other RDA relationships O Role and relationship properties are registered at the same level as elements, rather than as attributes (as MARC does with relators, and RDA does in its XML schemas)Oslo 2013 2/7/13 9
    10. 10. Vocabulary Extension O The inclusion of unconstrained properties provides a path for extension of RDA into specialized library communities and non- library communities O They may have a different notion of how FRBR „aggregates‟ (For example, a colorized version of a film may be viewed as a separate work) O They may not wish to use FRBR at all O They may have additional, domain-specific properties to add, that could benefit from a relationship to the RDA propertiesOslo 2013 2/7/13 10
    11. 11. RDA:adaptedAs RDA:adaptedAsARadioScriptOslo 2013 2/7/13 11
    12. 12. RDA:adaptedAs RDA:adaptedAsARadioScri pt KidLit:adaptedAsAPictureBo ok Extension using Unconstrained PropertiesOslo 2013 2/7/13 12
    13. 13. RDA:adapted As RDA:adaptedAsARadioScr ipt KidLit:adaptedAsAPictureBo okKidLit:adaptedAsAChapterBook Extension using Unconstrained PropertiesOslo 2013 2/7/13 13
    14. 14. Where you start affects where you end up O Simple metadata is more useful as output than input O The „long tail‟ of MARC‟s lesser used properties was built up over decades and shouldn‟t be discarded O Easier to dumb down than smarten up O Dublin Core and MARC examples of starting simple and trying to add on O Distribution models are importantOslo 2013 2/7/13 14
    15. 15. Values vs. Costs O Machines cost less than people, but they can‟t replace people O Computers tend to require instructions from people to work well O But they are more consistent than people! O ROI culture vs. Potlatch culture O Is „who pays for this?‟ the right question?Oslo 2013 2/7/13 15
    16. 16. The Management Conundrum O Traditional ILS‟s haven‟t worked for us for a long time O They were built to create and manage catalog data O We can no longer invest in the catalog paradigm O Libraries are data builders, data managers, data distributors O The centralized, master record model is as dead as MARC encodingOslo 2013 2/7/13 16
    17. 17. Linked Data is Inherently Chaotic O Requires creating and aggregating data in a broader context O There is no one „correct‟ record to be made from this, no objective „truth‟ O This approach is different from the cataloging tradition O BUT, the focus on vocabularies is familiar O In the SemWeb world vocabularies are more complex than the thesauri we knowOslo 2013 2/7/13 18
    18. 18. Model of ‘the World’ /XML O XML assumes a closed world (domain), usually defined by a schema: O "We know all of the data describing this resource. The single description must be a valid document according to our schema. The data must be valid.” O XMLs document model provides a neat equivalence to a metadata record‟ (and most of us are fairly comfortable with it)Oslo 2013 2/7/13 19
    19. 19. Model of ‘the World’ /RDF O RDF assumes an open world: O "Theres an infinite amount of unknown data describing this resource yet to be discovered. It will come from an infinite number of providers. There will be an infinite number of descriptions. Those descriptions must be consistent." O RDFs statement-oriented data model has no notion of record‟ (rather, statements can be aggregated for a fuller description of a resource)Oslo 2013 2/7/13 20
    20. 20. The New Management Strategy O Statement level rather than record level management O Emphasis on evaluation coming in and provenance going out O Shift in human effort from creating standard cataloging to knowledgeable human intervention in machine-based processes O Extensive use of data created outside libraries O Intelligent re-use of our legacy dataOslo 2013 2/7/13 21
    21. 21. Is MARC Dead?O The communication format is very dead (based on standards no longer updated)O The semantics are not dead O They represent the distillation of decades of descriptive experience O As we move into a more machine-assisted world, our old concerns about the size of our legacy can be addressed O Taking the legacy records with us should be based on solutions developed using open and transparent strategies
    22. 22. What’s our Distribution Model? We know more about We don’t know what what you want than you you want, so choose! do. Here it is!Oslo 2013 2/7/13 23
    23. 23. Libraries as Data Publishers & ConsumersO Data from library „publishers‟ should look like a supermarket—lots of choices, with decisions made by consumers O Right now we seem to be operating as Soviet bakeries O This is not what open linked data is supposed to be doing for usO "Be conservative in what you send, liberal in what you accept”—Robustness Principle
    24. 24. Our Goals as Data PublishersO If we want people outside libraries to use our data, we need offer them choicesO This strategy is based on mapping all of our legacy data O Not a selection O Filtering accomplished by data consumers, who know best what they needO This requires active innovation and a new understanding of how to manage the data
    25. 25. Our Goals as Data Consumers O As aggregators of relevant metadata content O Developing methods to gather and redistribute without necessarily re-creating OCLC O Modeling and documenting best practices in metadata creation, improvement and exposure O Application profiles important in this effort O As developers of vocabularies exposing a variety of bibliographic relationships O As innovators in using social networks to enhance bibliographic descriptionOslo 2013 2/7/13 26
    26. 26. Mapping Legacy Data for Re-distributionO If we want data consumers to value our data, we should map it all O We can distribute limited „flavors‟ as well, as we gain experience and feedbackO Current mapping strategies are based on O One-time, inflexible, programmatic methods that effectively hide the process from consumers O Assumptions that data must be improved at the time it is mapped, or never
    27. 27. Oslo 2013 2/7/13 28
    28. 28. Oslo 2013 2/7/13 29
    29. 29. If we don‟t distribute our best data, how can anybody do cool stuff with it? Isn‟t that what we want? We can use the cool stuff ourselves!Oslo 2013 2/7/13 30
    30. 30. Oslo 2013 2/7/13 31
    31. 31. Oslo 2013 2/7/13 32
    32. 32. Oslo 2013 2/7/13 33
    33. 33. Harvest/Ingest Plan O Choosing data sources O There are known sources out there, some of them are of good quality, others are usable, with improvement O Tools are needed to help pull data, validate it, cache it, and set it up for evaluation O Most of these tasks can/should be set up with automated processes, with alerts to human minders when something goesOslo 2013 wrong 2/7/13 34
    34. 34. Oslo 2013 2/7/13 35
    35. 35. Metadata Evaluation O Evaluation needs to scale well beyond random sampling O Statistical and data mining tools need to be brought into the process, to provide both „overview‟ and specifics of whole data sets O Improvement specifications, techniques, quality criteria and tools need to be iterative, granular, and shareableOslo 2013 2/7/13 36
    36. 36. Oslo 2013 2/7/13 37
    37. 37. Testing, Monitoring & Re- evaluation O Data will change, and processes must be able to detect that, based on data profiles O Human intervention should be limited O Tools need to be built so that non- programmers can run them O Reading logs, monitoring error reports, checking results, writing specs, can/should be done by data specialists (a.k.a. catalogers w/training) O Looking for opportunities for programmers and catalogers to learn together is essential!Oslo 2013 2/7/13 38
    38. 38. Oslo 2013 2/7/13 39
    39. 39. Re-distribution Plan O If we improve data, we need to expose how we did it (and what we did), for the use of downstream consumers O New metadata provenance efforts are designed to do this at the statement level O This strategy can only exist successfully where open licenses allow innovation and wide re-use O Ideally, distribution AND redistribution should be accomplished with Application ProfilesOslo 2013 2/7/13 40
    40. 40. Will This Shift Cost Too Much? O It‟s the human effort that costs us O Cost of traditional cataloging is far too high, for increasingly dubious value O Our current investments have reached the end of their usefulness O All the possible efficiencies for traditional cataloging have already been accomplished O Waiting for leadership from the big players costs us valuable time with no guarantees of results O We need to figure out how to invest in more distributed innovation and focused collaborationOslo 2013 2/7/13 41
    41. 41. What About the Millions? O Our legacy MARC data is already a „graph‟, but the resources defined there have no internet resolvable identity O But even the transcribed text can be hugely valuable, with effort and software to help O Projects like the eXtensible Catalog have made an excellent start in demonstrating this point O MARC 21 is already available as basicOslo 2013 2/7/13 42 RDF
    42. 42. The Bottom Line O Our big investment is (and has always been) in our data, not our systems O Over many changes in format of materials, we‟ve always struggled to keep our focus on the data content that endures, regardless of presentation format O We are in a great position to have influence on how the future develops, but we can‟t be afraid to change, or afraid to 2/7/13 failOslo 2013 43
    43. 43. Contact info: metadata.maven Metadata Matters: http://managemetadata.c om/blogOslo 2013 44