SlideShare a Scribd company logo
1 of 44
Challenges for the
    New Era
       Diane I. Hillmann
Metadata Management Associates
     Oslo, February 8, 2013
Big Challenges/Big Ideas
     O Changing our thinking from records to
        statements
            O Will RDA help?
     O Where you start affects where you end up
     O Shifting our ways from ROI to Potlach
     O Recognizing that our human resources
       are limited
     O So how do we manage this data-that-isn‟t
       records?

Oslo 2013                                      2/7/13   2
Statements and Records
     O Records are still important but not as
        we‟ve used them in the past
            O We might want to think about records as
              the instantiation of a point of view
            O News: traditional library data has a point of
              view
     O MARC required consensus because of
        limitations built into the technology
            O Now we need provenance, so we know
              “Who said?”
Oslo 2013                                              2/7/13   3
Building RDVocab: Goals

     O Bridge the XML and RDF worlds
     O Ensure ability to map between RDA and
       other element sets
     O Provide a sound platform for extension of
       RDA Vocabularies into new and
       specialized domains
     O Consider methods for expressing AACR2
       structures in technical ways to ease the
       pain of transition to RDA

Oslo 2013                                    2/7/13   4
RDVocab Structure, Simplified
     O RDA Properties declared in two separate
        hierarchies:
            O An „unconstrained‟ vocabulary, with no explicit
              relationship to FRBR entities
            O A subset of classes, properties and
              subproperties with FRBR entities as „domains‟
     O Pros: retained usability in or out of
       libraries; better mapping to/from non-
       FRBR vocabularies
     O Cons: still seems too complex to many
       SemWeb implementers (many using
       BIBO)
Oslo 2013                                                 2/7/13   5
Why Unconstrained
                   Properties?
     O The „bounded‟ properties should be seen as
        the official JSC-defined RDA Application
        Profile for libraries
            O What‟s still lacking is the addition of the necessary
              constraints: datatypes, cardinality, associated
              value vocabularies
     O Extensions and mapping should be built from
        the unconstrained properties
            O Unconstrained vocabularies necessary for use in
              domains where FRBR not assumed or
              inappropriate
            O Mapping from vocabularies not using the FRBR
              model directly to ones that do (and back) creates
              serious problems for the „Web of Data‟
Oslo 2013                                                       2/7/13   6
Property (Generalized, no FRBR
                                                 Semantic
                  relationship)
                                                   Web

    Subproperty (with relationship to one FRBR
                       entity)


                                                  FRBR Entity




                                           The Simple Case:
            Library Applications            One Property--
                                           One FRBR Entity
Oslo 2013                                            2/7/13     7
Property (Generalized, no FRBR
                  relationship)
                                                    Semantic
                                                      Web

            Subproperty (with relationship to one
                       FRBR entity)
                                                    FRBR Entity
            Subproperty (with relationship to one
                       FRBR entity)
                                                    FRBR Entity




            Library Applications The Not-So-Simple Case:
                                 One Property—more than
Oslo 2013                            One FRBR Entity 8
                                               2/7/13
Roles: Attributes or
                Properties?
     O In 2005, the DC Usage Board worked with LC
       to build a formal representation of the MARC
       Relators so that these terms could be used
       with DC
        O This work provided a template for the
          registration of the role terms in RDA (in
          Appendix I) and, by extension, the other
          RDA relationships
     O Role and relationship properties are
       registered at the same level as elements,
       rather than as attributes (as MARC does with
       relators, and RDA does in its XML schemas)

Oslo 2013                                       2/7/13   9
Vocabulary Extension
     O The inclusion of unconstrained properties
        provides a path for extension of RDA into
        specialized library communities and non-
        library communities
         O They may have a different notion of how
            FRBR „aggregates‟ (For example, a
            colorized version of a film may be viewed
            as a separate work)
         O They may not wish to use FRBR at all
         O They may have additional, domain-specific
            properties to add, that could benefit from a
            relationship to the RDA properties
Oslo 2013                                           2/7/13   10
RDA:adaptedAs




            RDA:adaptedAsARadioScript




Oslo 2013                               2/7/13   11
RDA:adaptedAs

                 RDA:adaptedAsARadioScri
                 pt


      KidLit:adaptedAsAPictureBo
      ok
       Extension using Unconstrained Properties

Oslo 2013                              2/7/13   12
RDA:adapted
 As
                  RDA:adaptedAsARadioScr
                  ipt


               KidLit:adaptedAsAPictureBo
               ok

KidLit:adaptedAsAChapterBo
ok
      Extension using Unconstrained Properties
Oslo 2013                            2/7/13   13
Where you start affects where you
              end up
     O Simple metadata is more useful as output
        than input
            O The „long tail‟ of MARC‟s lesser used
              properties was built up over decades and
              shouldn‟t be discarded
     O Easier to dumb down than smarten up
     O Dublin Core and MARC examples of
       starting simple and trying to add on
     O Distribution models are important
Oslo 2013                                             2/7/13   14
Values vs. Costs
     O Machines cost less than people, but they
        can‟t replace people
            O Computers tend to require instructions
              from people to work well
            O But they are more consistent than people!
     O ROI culture vs. Potlatch culture
            O Is „who pays for this?‟ the right question?




Oslo 2013                                              2/7/13   15
The Management Conundrum
     O Traditional ILS‟s haven‟t worked for us for
        a long time
            O They were built to create and manage
              catalog data
            O We can no longer invest in the catalog
              paradigm
     O Libraries are data builders, data
        managers, data distributors
            O The centralized, master record model is as
              dead as MARC encoding
Oslo 2013                                              2/7/13   16
Linked Data is Inherently
                Chaotic
     O Requires creating and aggregating data in
        a broader context
            O There is no one „correct‟ record to be made
              from this, no objective „truth‟
     O This approach is different from the
        cataloging tradition
            O BUT, the focus on vocabularies is familiar
     O In the SemWeb world vocabularies are
        more complex than the thesauri we know
Oslo 2013                                             2/7/13   18
Model of ‘the World’ /XML
     O XML assumes a 'closed' world (domain),
        usually defined by a schema:
         O "We know all of the data describing this
           resource. The single description must
           be a valid document according to our
           schema. The data must be valid.”
         O XML's document model provides
            a neat equivalence to a
            metadata 'record‟ (and most of
            us are fairly comfortable with it)
Oslo 2013                                      2/7/13   19
Model of ‘the World’ /RDF
     O RDF assumes an 'open' world:
       O "There's an infinite amount of unknown
             data describing this resource yet to be
             discovered. It will come from an infinite
             number of providers. There will be an
             infinite number of descriptions. Those
             descriptions must be consistent."
            O RDF's statement-oriented data
             model has no notion of 'record‟
             (rather, statements can be
             aggregated for a fuller description of
             a resource)
Oslo 2013                                                2/7/13   20
The New Management
                  Strategy
     O Statement level rather than record level
         management
     O   Emphasis on evaluation coming in and
         provenance going out
     O   Shift in human effort from creating standard
         cataloging to knowledgeable human
         intervention in machine-based processes
     O   Extensive use of data created outside libraries
     O   Intelligent re-use of our legacy data

Oslo 2013                                           2/7/13   21
Is MARC Dead?
O The communication format is very dead (based on
  standards no longer updated)
O The semantics are not dead
   O They represent the distillation of decades of
     descriptive experience
   O As we move into a more machine-assisted world,
     our old concerns about the size of our legacy can
     be addressed
   O Taking the legacy records with us should be based
     on solutions developed using open and transparent
     strategies
What’s our Distribution
                    Model?
                               We know more about
        We don’t know what    what you want than you
       you want, so choose!        do. Here it is!




Oslo 2013                                     2/7/13   23
Libraries as Data Publishers & Consumers

O Data from library „publishers‟ should look like a
  supermarket—lots of choices, with decisions made
  by consumers
   O Right now we seem to be operating as Soviet
     bakeries
   O This is not what open linked data is supposed to be
     doing for us
O "Be conservative in what you send, liberal in what
  you accept”—Robustness Principle
Our Goals as Data Publishers
O If we want people outside libraries to use our
  data, we need offer them choices
O This strategy is based on mapping all of our
  legacy data
   O Not a selection
   O Filtering accomplished by data consumers, who
     know best what they need
O This requires active innovation and a new
  understanding of how to manage the data
Our Goals as Data Consumers
     O As aggregators of relevant metadata content
            O Developing methods to gather and redistribute
              without necessarily re-creating OCLC
     O Modeling and documenting best practices in
        metadata creation, improvement and
        exposure
            O Application profiles important in this effort
     O As developers of vocabularies exposing a
       variety of bibliographic relationships
     O As innovators in using social networks to
       enhance bibliographic description
Oslo 2013                                                     2/7/13   26
Mapping Legacy Data for Re-distribution

O If we want data consumers to value our data, we
 should map it all
   O We can distribute limited „flavors‟ as well, as we
     gain experience and feedback
O Current mapping strategies are based on
   O One-time, inflexible, programmatic methods that
     effectively hide the process from consumers
   O Assumptions that data must be improved at the
     time it is mapped, or never
Oslo 2013   2/7/13   28
Oslo 2013   2/7/13   29
If we don‟t distribute our best data,
            how can anybody do cool stuff with it?



                                         Isn‟t that what
                                            we want?




                          We can use the cool stuff ourselves!
Oslo 2013                                       2/7/13   30
Oslo 2013   2/7/13   31
Oslo 2013   2/7/13   32
Oslo 2013   2/7/13   33
Harvest/Ingest Plan
     O Choosing data sources
            O There are known sources out there, some
              of them are of good quality, others are
              usable, with improvement
     O Tools are needed to help pull data,
        validate it, cache it, and set it up for
        evaluation
            O Most of these tasks can/should be set up
              with automated processes, with alerts to
              human minders when something goes
Oslo 2013     wrong                                  2/7/13   34
Oslo 2013   2/7/13   35
Metadata Evaluation
     O Evaluation needs to scale well beyond
       random sampling
     O Statistical and data mining tools need to
       be brought into the process, to provide
       both „overview‟ and specifics of whole
       data sets
     O Improvement specifications, techniques,
       quality criteria and tools need to be
       iterative, granular, and shareable
Oslo 2013                                      2/7/13   36
Oslo 2013   2/7/13   37
Testing, Monitoring & Re-
                    evaluation
     O Data will change, and processes must be able
        to detect that, based on data profiles
            O Human intervention should be limited
     O Tools need to be built so that non-
        programmers can run them
            O Reading logs, monitoring error reports,
              checking results, writing specs, can/should be
              done by data specialists (a.k.a. catalogers
              w/training)
            O Looking for opportunities for programmers and
              catalogers to learn together is essential!
Oslo 2013                                               2/7/13   38
Oslo 2013   2/7/13   39
Re-distribution Plan
     O If we improve data, we need to expose
        how we did it (and what we did), for the
        use of downstream consumers
            O New metadata provenance efforts are
              designed to do this at the statement level
     O This strategy can only exist successfully
       where open licenses allow innovation and
       wide re-use
     O Ideally, distribution AND redistribution
       should be accomplished with Application
       Profiles
Oslo 2013                                             2/7/13   40
Will This Shift Cost Too
                    Much?
     O It‟s the human effort that costs us
            O Cost of traditional cataloging is far too high, for
              increasingly dubious value
     O Our current investments have reached the
        end of their usefulness
            O All the possible efficiencies for traditional
              cataloging have already been accomplished
     O Waiting for leadership from the big players
       costs us valuable time with no guarantees of
       results
     O We need to figure out how to invest in more
       distributed innovation and focused
       collaboration
Oslo 2013                                                     2/7/13   41
What About the Millions?
     O Our legacy MARC data is already a
       „graph‟, but the resources defined there
       have no internet resolvable identity
     O But even the transcribed text can be
       hugely valuable, with effort and software
       to help
            O Projects like the eXtensible Catalog have
              made an excellent start in demonstrating
              this point
     O MARC 21 is already available as basic
Oslo 2013                                            2/7/13   42
        RDF
The Bottom Line
            O Our big investment is (and has always
              been) in our data, not our systems
            O Over many changes in format of
              materials, we‟ve always struggled to keep
              our focus on the data content that
              endures, regardless of presentation
              format
            O We are in a great position to have
              influence on how the future develops, but
              we can‟t be afraid to change, or afraid to
                                                       2/7/13
              fail
Oslo 2013                                                       43
Contact info:
            metadata.maven
             @gmail.com



                 Metadata
                 Matters:
            http://managemetadata.c
                     om/blog




Oslo 2013                             44

More Related Content

Similar to Challenges for a new era

RDF for Librarians
RDF for LibrariansRDF for Librarians
RDF for LibrariansJenn Riley
 
Owl web ontology language
Owl  web ontology languageOwl  web ontology language
Owl web ontology languagehassco2011
 
Owl web ontology language
Owl  web ontology languageOwl  web ontology language
Owl web ontology languagehassco2011
 
Dublin Core Metadata Initiative Abstract Model
Dublin Core Metadata Initiative Abstract ModelDublin Core Metadata Initiative Abstract Model
Dublin Core Metadata Initiative Abstract ModelJenn Riley
 
The RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They WorkThe RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They WorkDiane Hillmann
 
Diane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic WebDiane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic WebALATechSource
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRichard.Sapon-White
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
BLOGIC. (ISWC 2009 Invited Talk)
BLOGIC.  (ISWC 2009 Invited Talk)BLOGIC.  (ISWC 2009 Invited Talk)
BLOGIC. (ISWC 2009 Invited Talk)Pat Hayes
 
NISO Bibliographic Roadmap Meeting Proposal
NISO Bibliographic Roadmap Meeting ProposalNISO Bibliographic Roadmap Meeting Proposal
NISO Bibliographic Roadmap Meeting ProposalDiane Hillmann
 
Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?
Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?
Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?eby
 
RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesKurt Cagle
 
OODBMSvsORDBMSppt.pptx
OODBMSvsORDBMSppt.pptxOODBMSvsORDBMSppt.pptx
OODBMSvsORDBMSppt.pptxMEHMOODNadeem
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
The Future of Cataloging and Catalogers
The Future of Cataloging and CatalogersThe Future of Cataloging and Catalogers
The Future of Cataloging and Catalogerskramsey
 

Similar to Challenges for a new era (20)

RDF for Librarians
RDF for LibrariansRDF for Librarians
RDF for Librarians
 
Owl web ontology language
Owl  web ontology languageOwl  web ontology language
Owl web ontology language
 
Owl web ontology language
Owl  web ontology languageOwl  web ontology language
Owl web ontology language
 
Dublin Core Metadata Initiative Abstract Model
Dublin Core Metadata Initiative Abstract ModelDublin Core Metadata Initiative Abstract Model
Dublin Core Metadata Initiative Abstract Model
 
The RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They WorkThe RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They Work
 
Diane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic WebDiane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic Web
 
Demystifying RDF
Demystifying RDFDemystifying RDF
Demystifying RDF
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
 
Semantics
SemanticsSemantics
Semantics
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
BLOGIC. (ISWC 2009 Invited Talk)
BLOGIC.  (ISWC 2009 Invited Talk)BLOGIC.  (ISWC 2009 Invited Talk)
BLOGIC. (ISWC 2009 Invited Talk)
 
UNC visit
UNC visitUNC visit
UNC visit
 
NISO Bibliographic Roadmap Meeting Proposal
NISO Bibliographic Roadmap Meeting ProposalNISO Bibliographic Roadmap Meeting Proposal
NISO Bibliographic Roadmap Meeting Proposal
 
Aall denver 2010
Aall denver 2010Aall denver 2010
Aall denver 2010
 
Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?
Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?
Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data?
 
ASIS&T 2010
ASIS&T 2010ASIS&T 2010
ASIS&T 2010
 
RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data Frames
 
OODBMSvsORDBMSppt.pptx
OODBMSvsORDBMSppt.pptxOODBMSvsORDBMSppt.pptx
OODBMSvsORDBMSppt.pptx
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
The Future of Cataloging and Catalogers
The Future of Cataloging and CatalogersThe Future of Cataloging and Catalogers
The Future of Cataloging and Catalogers
 

More from Diane Hillmann

RDA and Linked Data: where's the beef
RDA and Linked Data: where's the beefRDA and Linked Data: where's the beef
RDA and Linked Data: where's the beefDiane Hillmann
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCDiane Hillmann
 
Vocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY IntroductionVocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY IntroductionDiane Hillmann
 
What Can We Do About Our Legacy Data?
What Can We Do About Our Legacy Data?What Can We Do About Our Legacy Data?
What Can We Do About Our Legacy Data?Diane Hillmann
 
Moving to an open world
Moving to an open worldMoving to an open world
Moving to an open worldDiane Hillmann
 
Versioning for Authorities, presentation at Midwinter Chicago 2015
Versioning  for Authorities, presentation at Midwinter Chicago 2015Versioning  for Authorities, presentation at Midwinter Chicago 2015
Versioning for Authorities, presentation at Midwinter Chicago 2015Diane Hillmann
 
RDA as linked data (RDA Forum)
RDA as linked data (RDA Forum)RDA as linked data (RDA Forum)
RDA as linked data (RDA Forum)Diane Hillmann
 
What is an RDA Record?
What is an RDA Record?What is an RDA Record?
What is an RDA Record?Diane Hillmann
 
Oregon State visit 2011
Oregon State visit 2011Oregon State visit 2011
Oregon State visit 2011Diane Hillmann
 
RDA & the New World of Metadata
RDA & the New World of MetadataRDA & the New World of Metadata
RDA & the New World of MetadataDiane Hillmann
 
The Other Side of Linked Open Data: Managing Metadata Aggregation
The Other Side of Linked Open Data: Managing Metadata AggregationThe Other Side of Linked Open Data: Managing Metadata Aggregation
The Other Side of Linked Open Data: Managing Metadata AggregationDiane Hillmann
 
A Consideration of Library Holdings in the World Beyond MARC
A Consideration of Library Holdings in the World Beyond MARCA Consideration of Library Holdings in the World Beyond MARC
A Consideration of Library Holdings in the World Beyond MARCDiane Hillmann
 
Maps & gaps: strategies for vocabulary design and development
Maps & gaps: strategies for vocabulary design and developmentMaps & gaps: strategies for vocabulary design and development
Maps & gaps: strategies for vocabulary design and developmentDiane Hillmann
 
New World of Metadata: Growing, Shifting, Merging
New World of Metadata: Growing, Shifting, MergingNew World of Metadata: Growing, Shifting, Merging
New World of Metadata: Growing, Shifting, MergingDiane Hillmann
 

More from Diane Hillmann (20)

RDA and Linked Data: where's the beef
RDA and Linked Data: where's the beefRDA and Linked Data: where's the beef
RDA and Linked Data: where's the beef
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARC
 
Vocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY IntroductionVocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY Introduction
 
What Can We Do About Our Legacy Data?
What Can We Do About Our Legacy Data?What Can We Do About Our Legacy Data?
What Can We Do About Our Legacy Data?
 
Moving to an open world
Moving to an open worldMoving to an open world
Moving to an open world
 
Why change?
Why change?Why change?
Why change?
 
Versioning for Authorities, presentation at Midwinter Chicago 2015
Versioning  for Authorities, presentation at Midwinter Chicago 2015Versioning  for Authorities, presentation at Midwinter Chicago 2015
Versioning for Authorities, presentation at Midwinter Chicago 2015
 
RDA as linked data (RDA Forum)
RDA as linked data (RDA Forum)RDA as linked data (RDA Forum)
RDA as linked data (RDA Forum)
 
What's goin' on?
What's goin' on?What's goin' on?
What's goin' on?
 
Playing with Jane
Playing with JanePlaying with Jane
Playing with Jane
 
What is an RDA Record?
What is an RDA Record?What is an RDA Record?
What is an RDA Record?
 
Oregon State visit 2011
Oregon State visit 2011Oregon State visit 2011
Oregon State visit 2011
 
RDA & the New World of Metadata
RDA & the New World of MetadataRDA & the New World of Metadata
RDA & the New World of Metadata
 
The Other Side of Linked Open Data: Managing Metadata Aggregation
The Other Side of Linked Open Data: Managing Metadata AggregationThe Other Side of Linked Open Data: Managing Metadata Aggregation
The Other Side of Linked Open Data: Managing Metadata Aggregation
 
Mapmakers
MapmakersMapmakers
Mapmakers
 
A Consideration of Library Holdings in the World Beyond MARC
A Consideration of Library Holdings in the World Beyond MARCA Consideration of Library Holdings in the World Beyond MARC
A Consideration of Library Holdings in the World Beyond MARC
 
Maps & gaps: strategies for vocabulary design and development
Maps & gaps: strategies for vocabulary design and developmentMaps & gaps: strategies for vocabulary design and development
Maps & gaps: strategies for vocabulary design and development
 
Lossless MARC Mapping
Lossless MARC MappingLossless MARC Mapping
Lossless MARC Mapping
 
New World of Metadata: Growing, Shifting, Merging
New World of Metadata: Growing, Shifting, MergingNew World of Metadata: Growing, Shifting, Merging
New World of Metadata: Growing, Shifting, Merging
 
Managing statements
Managing statementsManaging statements
Managing statements
 

Challenges for a new era

  • 1. Challenges for the New Era Diane I. Hillmann Metadata Management Associates Oslo, February 8, 2013
  • 2. Big Challenges/Big Ideas O Changing our thinking from records to statements O Will RDA help? O Where you start affects where you end up O Shifting our ways from ROI to Potlach O Recognizing that our human resources are limited O So how do we manage this data-that-isn‟t records? Oslo 2013 2/7/13 2
  • 3. Statements and Records O Records are still important but not as we‟ve used them in the past O We might want to think about records as the instantiation of a point of view O News: traditional library data has a point of view O MARC required consensus because of limitations built into the technology O Now we need provenance, so we know “Who said?” Oslo 2013 2/7/13 3
  • 4. Building RDVocab: Goals O Bridge the XML and RDF worlds O Ensure ability to map between RDA and other element sets O Provide a sound platform for extension of RDA Vocabularies into new and specialized domains O Consider methods for expressing AACR2 structures in technical ways to ease the pain of transition to RDA Oslo 2013 2/7/13 4
  • 5. RDVocab Structure, Simplified O RDA Properties declared in two separate hierarchies: O An „unconstrained‟ vocabulary, with no explicit relationship to FRBR entities O A subset of classes, properties and subproperties with FRBR entities as „domains‟ O Pros: retained usability in or out of libraries; better mapping to/from non- FRBR vocabularies O Cons: still seems too complex to many SemWeb implementers (many using BIBO) Oslo 2013 2/7/13 5
  • 6. Why Unconstrained Properties? O The „bounded‟ properties should be seen as the official JSC-defined RDA Application Profile for libraries O What‟s still lacking is the addition of the necessary constraints: datatypes, cardinality, associated value vocabularies O Extensions and mapping should be built from the unconstrained properties O Unconstrained vocabularies necessary for use in domains where FRBR not assumed or inappropriate O Mapping from vocabularies not using the FRBR model directly to ones that do (and back) creates serious problems for the „Web of Data‟ Oslo 2013 2/7/13 6
  • 7. Property (Generalized, no FRBR Semantic relationship) Web Subproperty (with relationship to one FRBR entity) FRBR Entity The Simple Case: Library Applications One Property-- One FRBR Entity Oslo 2013 2/7/13 7
  • 8. Property (Generalized, no FRBR relationship) Semantic Web Subproperty (with relationship to one FRBR entity) FRBR Entity Subproperty (with relationship to one FRBR entity) FRBR Entity Library Applications The Not-So-Simple Case: One Property—more than Oslo 2013 One FRBR Entity 8 2/7/13
  • 9. Roles: Attributes or Properties? O In 2005, the DC Usage Board worked with LC to build a formal representation of the MARC Relators so that these terms could be used with DC O This work provided a template for the registration of the role terms in RDA (in Appendix I) and, by extension, the other RDA relationships O Role and relationship properties are registered at the same level as elements, rather than as attributes (as MARC does with relators, and RDA does in its XML schemas) Oslo 2013 2/7/13 9
  • 10. Vocabulary Extension O The inclusion of unconstrained properties provides a path for extension of RDA into specialized library communities and non- library communities O They may have a different notion of how FRBR „aggregates‟ (For example, a colorized version of a film may be viewed as a separate work) O They may not wish to use FRBR at all O They may have additional, domain-specific properties to add, that could benefit from a relationship to the RDA properties Oslo 2013 2/7/13 10
  • 11. RDA:adaptedAs RDA:adaptedAsARadioScript Oslo 2013 2/7/13 11
  • 12. RDA:adaptedAs RDA:adaptedAsARadioScri pt KidLit:adaptedAsAPictureBo ok Extension using Unconstrained Properties Oslo 2013 2/7/13 12
  • 13. RDA:adapted As RDA:adaptedAsARadioScr ipt KidLit:adaptedAsAPictureBo ok KidLit:adaptedAsAChapterBo ok Extension using Unconstrained Properties Oslo 2013 2/7/13 13
  • 14. Where you start affects where you end up O Simple metadata is more useful as output than input O The „long tail‟ of MARC‟s lesser used properties was built up over decades and shouldn‟t be discarded O Easier to dumb down than smarten up O Dublin Core and MARC examples of starting simple and trying to add on O Distribution models are important Oslo 2013 2/7/13 14
  • 15. Values vs. Costs O Machines cost less than people, but they can‟t replace people O Computers tend to require instructions from people to work well O But they are more consistent than people! O ROI culture vs. Potlatch culture O Is „who pays for this?‟ the right question? Oslo 2013 2/7/13 15
  • 16. The Management Conundrum O Traditional ILS‟s haven‟t worked for us for a long time O They were built to create and manage catalog data O We can no longer invest in the catalog paradigm O Libraries are data builders, data managers, data distributors O The centralized, master record model is as dead as MARC encoding Oslo 2013 2/7/13 16
  • 17.
  • 18. Linked Data is Inherently Chaotic O Requires creating and aggregating data in a broader context O There is no one „correct‟ record to be made from this, no objective „truth‟ O This approach is different from the cataloging tradition O BUT, the focus on vocabularies is familiar O In the SemWeb world vocabularies are more complex than the thesauri we know Oslo 2013 2/7/13 18
  • 19. Model of ‘the World’ /XML O XML assumes a 'closed' world (domain), usually defined by a schema: O "We know all of the data describing this resource. The single description must be a valid document according to our schema. The data must be valid.” O XML's document model provides a neat equivalence to a metadata 'record‟ (and most of us are fairly comfortable with it) Oslo 2013 2/7/13 19
  • 20. Model of ‘the World’ /RDF O RDF assumes an 'open' world: O "There's an infinite amount of unknown data describing this resource yet to be discovered. It will come from an infinite number of providers. There will be an infinite number of descriptions. Those descriptions must be consistent." O RDF's statement-oriented data model has no notion of 'record‟ (rather, statements can be aggregated for a fuller description of a resource) Oslo 2013 2/7/13 20
  • 21. The New Management Strategy O Statement level rather than record level management O Emphasis on evaluation coming in and provenance going out O Shift in human effort from creating standard cataloging to knowledgeable human intervention in machine-based processes O Extensive use of data created outside libraries O Intelligent re-use of our legacy data Oslo 2013 2/7/13 21
  • 22. Is MARC Dead? O The communication format is very dead (based on standards no longer updated) O The semantics are not dead O They represent the distillation of decades of descriptive experience O As we move into a more machine-assisted world, our old concerns about the size of our legacy can be addressed O Taking the legacy records with us should be based on solutions developed using open and transparent strategies
  • 23. What’s our Distribution Model? We know more about We don’t know what what you want than you you want, so choose! do. Here it is! Oslo 2013 2/7/13 23
  • 24. Libraries as Data Publishers & Consumers O Data from library „publishers‟ should look like a supermarket—lots of choices, with decisions made by consumers O Right now we seem to be operating as Soviet bakeries O This is not what open linked data is supposed to be doing for us O "Be conservative in what you send, liberal in what you accept”—Robustness Principle
  • 25. Our Goals as Data Publishers O If we want people outside libraries to use our data, we need offer them choices O This strategy is based on mapping all of our legacy data O Not a selection O Filtering accomplished by data consumers, who know best what they need O This requires active innovation and a new understanding of how to manage the data
  • 26. Our Goals as Data Consumers O As aggregators of relevant metadata content O Developing methods to gather and redistribute without necessarily re-creating OCLC O Modeling and documenting best practices in metadata creation, improvement and exposure O Application profiles important in this effort O As developers of vocabularies exposing a variety of bibliographic relationships O As innovators in using social networks to enhance bibliographic description Oslo 2013 2/7/13 26
  • 27. Mapping Legacy Data for Re-distribution O If we want data consumers to value our data, we should map it all O We can distribute limited „flavors‟ as well, as we gain experience and feedback O Current mapping strategies are based on O One-time, inflexible, programmatic methods that effectively hide the process from consumers O Assumptions that data must be improved at the time it is mapped, or never
  • 28. Oslo 2013 2/7/13 28
  • 29. Oslo 2013 2/7/13 29
  • 30. If we don‟t distribute our best data, how can anybody do cool stuff with it? Isn‟t that what we want? We can use the cool stuff ourselves! Oslo 2013 2/7/13 30
  • 31. Oslo 2013 2/7/13 31
  • 32. Oslo 2013 2/7/13 32
  • 33. Oslo 2013 2/7/13 33
  • 34. Harvest/Ingest Plan O Choosing data sources O There are known sources out there, some of them are of good quality, others are usable, with improvement O Tools are needed to help pull data, validate it, cache it, and set it up for evaluation O Most of these tasks can/should be set up with automated processes, with alerts to human minders when something goes Oslo 2013 wrong 2/7/13 34
  • 35. Oslo 2013 2/7/13 35
  • 36. Metadata Evaluation O Evaluation needs to scale well beyond random sampling O Statistical and data mining tools need to be brought into the process, to provide both „overview‟ and specifics of whole data sets O Improvement specifications, techniques, quality criteria and tools need to be iterative, granular, and shareable Oslo 2013 2/7/13 36
  • 37. Oslo 2013 2/7/13 37
  • 38. Testing, Monitoring & Re- evaluation O Data will change, and processes must be able to detect that, based on data profiles O Human intervention should be limited O Tools need to be built so that non- programmers can run them O Reading logs, monitoring error reports, checking results, writing specs, can/should be done by data specialists (a.k.a. catalogers w/training) O Looking for opportunities for programmers and catalogers to learn together is essential! Oslo 2013 2/7/13 38
  • 39. Oslo 2013 2/7/13 39
  • 40. Re-distribution Plan O If we improve data, we need to expose how we did it (and what we did), for the use of downstream consumers O New metadata provenance efforts are designed to do this at the statement level O This strategy can only exist successfully where open licenses allow innovation and wide re-use O Ideally, distribution AND redistribution should be accomplished with Application Profiles Oslo 2013 2/7/13 40
  • 41. Will This Shift Cost Too Much? O It‟s the human effort that costs us O Cost of traditional cataloging is far too high, for increasingly dubious value O Our current investments have reached the end of their usefulness O All the possible efficiencies for traditional cataloging have already been accomplished O Waiting for leadership from the big players costs us valuable time with no guarantees of results O We need to figure out how to invest in more distributed innovation and focused collaboration Oslo 2013 2/7/13 41
  • 42. What About the Millions? O Our legacy MARC data is already a „graph‟, but the resources defined there have no internet resolvable identity O But even the transcribed text can be hugely valuable, with effort and software to help O Projects like the eXtensible Catalog have made an excellent start in demonstrating this point O MARC 21 is already available as basic Oslo 2013 2/7/13 42 RDF
  • 43. The Bottom Line O Our big investment is (and has always been) in our data, not our systems O Over many changes in format of materials, we‟ve always struggled to keep our focus on the data content that endures, regardless of presentation format O We are in a great position to have influence on how the future develops, but we can‟t be afraid to change, or afraid to 2/7/13 fail Oslo 2013 43
  • 44. Contact info: metadata.maven @gmail.com Metadata Matters: http://managemetadata.c om/blog Oslo 2013 44

Editor's Notes

  1. Some conflicts here
  2. This way of handling roles should be usable in either XML or RDF (essential for RDF)
  3. Smörgåsbord Norway it is called koldtbord
  4. This practice makes it difficult for community members to effectively respond to decisions made behind the curtain or to contribute to better maps
  5. (a la the open source OPAC replacements)