From Records To
   Statements
    Taking the Leap
What’s different
about statement
     data?

Library data compliance
has been defined by
consensus since MARC
was a pup
But outside the MARC
silo we need different
strategies
To accomplish this we
need to look at value,
costs and investments
very differently

                          2
                              Flickr photo by Robert Jagendorf ALA Dallas, 1/20/12
What Are Statements?

• A MARC record can be viewed as an aggregation
  of statements
  • All the attribute = value pairs relate to the same
    resource

• In a linked data world, statements are dis-
  aggregated and each carries the relationship to a
  resource as the ‘subject’ of each triple
• Though it seems more complicated to deal with
  statements in isolation, it is really simpler (the
  complications are that we know little about it)

                             3                           ALA Dallas, 1/20/12
Future Metadata
              Strategies
• Statement level rather than record level management
• Records as units of transport rather than units of
  management
• Emphasis on evaluation coming in and provenance
  going out
• Shift in human effort from creating standard cataloging
  to careful human intervention in machine-based
  processes
• Extensive use of data created outside libraries
• Intelligent re-use of our legacy data
                             4                         ALA Dallas, 1/20/12
Managing Statements




       http://dcpapers.dublincore.org/ojs/pubs/article/view/770/766
            5                                       ALA Dallas, 1/20/12
[Possible] New Roles for
           Librarians
• Aggregators of relevant metadata content
  • Developing methods to expose & redistribute without a
    central node

• Modeling and documenting best practices in metadata
  creation, improvement and exposure
  • Application profiles important in this effort

• Developers of vocabularies using bibliographic
  relationships
• Innovators in using social networks to enhance
  bibliographic description


                                6                       ALA Dallas, 1/20/12
Re-Thinking Metadata
        Management




7               ALA Dallas, 1/20/12
8   ALA Dallas, 1/20/12
Harvest/Ingest Plan

• Choosing data sources
  • There are known sources out there, some of them
    are of good quality, others are usable, with
    improvement

• Tools are needed to help pull data, validate it,
  cache it, and set it up for evaluation
  • Most of these tasks can/should be set up with
    automated processes, with alerts to human minders
    when something goes wrong


                           9                         ALA Dallas, 1/20/12
10   ALA Dallas, 1/20/12
Metadata Evaluation

• Evaluation needs to scale well beyond random
  sampling

• Statistical and data mining tools need to be
  brought into the process, to provide both
  ‘overview’ and specifics of whole data sets

• Improvement specifications, techniques, quality
  criteria and tools need to be iterative, granular,
  and shareable


                           11                      ALA Dallas, 1/20/12
12   ALA Dallas, 1/20/12
Testing, Monitoring & Re-
               evaluation

• Data will change, and processes must be able to
  detect that, based on data profiles
  • Human intervention should be limited

• Tools need to be built so that non-programmers
  can run them
  • Reading logs, monitoring error reports, checking
    results, writing specs, can/should be done by data
    specialists (a.k.a. catalogers w/training)
  • Looking for opportunities for programmers and
    catalogers to learn together is essential

                           13                        ALA Dallas, 1/20/12
14   ALA Dallas, 1/20/12
Re-distribution Plan

• If we improve data, we need to expose how we
  did it (and what we did), for the use of
  downstream consumers
  • New metadata provenance efforts designed to do
    this at the statement level

• This strategy can only exist successfully where
  open licenses allow innovation and wide re-use

• Ideally, distribution AND redistribution should be
  accomplished with Application Profiles

                          15                      ALA Dallas, 1/20/12
Will This Shift Cost Too
               Much?
• It’s the human effort that costs us
  • Cost of traditional cataloging is far too high, for
    increasingly dubious value

• Our current investments have reached the end of their
  usefulness
  • All the possible efficiencies for traditional cataloging have
    already been accomplished

• Waiting for leadership from the big players costs us
  valuable time with no guarantees of results
• We need to figure out how to invest in more distributed
  innovation and focused collaboration

                                16                           ALA Dallas, 1/20/12
ROI in the LOD World

• Free metadata is essential in a ‘culture economy’
  • We need eyeballs, attention, connection for our
    content!

• Thinking about ROI based on recovering the cost
  of creating metadata is a dead end

• To drive people to your content, you need to put
  your data out there
  • But once it’s there, it’s out of your control, and we
    need to get comfortable with that

                             17                         ALA Dallas, 1/20/12
Thank you!
 Questions?
Contact info:
metadata.maven
@gmail.com


Metadata
Matters:
http://managemet
adata.com/blog



       ALA Dallas, 1/20/12   18

Managing statements

  • 1.
    From Records To Statements Taking the Leap
  • 2.
    What’s different about statement data? Library data compliance has been defined by consensus since MARC was a pup But outside the MARC silo we need different strategies To accomplish this we need to look at value, costs and investments very differently 2 Flickr photo by Robert Jagendorf ALA Dallas, 1/20/12
  • 3.
    What Are Statements? •A MARC record can be viewed as an aggregation of statements • All the attribute = value pairs relate to the same resource • In a linked data world, statements are dis- aggregated and each carries the relationship to a resource as the ‘subject’ of each triple • Though it seems more complicated to deal with statements in isolation, it is really simpler (the complications are that we know little about it) 3 ALA Dallas, 1/20/12
  • 4.
    Future Metadata Strategies • Statement level rather than record level management • Records as units of transport rather than units of management • Emphasis on evaluation coming in and provenance going out • Shift in human effort from creating standard cataloging to careful human intervention in machine-based processes • Extensive use of data created outside libraries • Intelligent re-use of our legacy data 4 ALA Dallas, 1/20/12
  • 5.
    Managing Statements http://dcpapers.dublincore.org/ojs/pubs/article/view/770/766 5 ALA Dallas, 1/20/12
  • 6.
    [Possible] New Rolesfor Librarians • Aggregators of relevant metadata content • Developing methods to expose & redistribute without a central node • Modeling and documenting best practices in metadata creation, improvement and exposure • Application profiles important in this effort • Developers of vocabularies using bibliographic relationships • Innovators in using social networks to enhance bibliographic description 6 ALA Dallas, 1/20/12
  • 7.
    Re-Thinking Metadata Management 7 ALA Dallas, 1/20/12
  • 8.
    8 ALA Dallas, 1/20/12
  • 9.
    Harvest/Ingest Plan • Choosingdata sources • There are known sources out there, some of them are of good quality, others are usable, with improvement • Tools are needed to help pull data, validate it, cache it, and set it up for evaluation • Most of these tasks can/should be set up with automated processes, with alerts to human minders when something goes wrong 9 ALA Dallas, 1/20/12
  • 10.
    10 ALA Dallas, 1/20/12
  • 11.
    Metadata Evaluation • Evaluationneeds to scale well beyond random sampling • Statistical and data mining tools need to be brought into the process, to provide both ‘overview’ and specifics of whole data sets • Improvement specifications, techniques, quality criteria and tools need to be iterative, granular, and shareable 11 ALA Dallas, 1/20/12
  • 12.
    12 ALA Dallas, 1/20/12
  • 13.
    Testing, Monitoring &Re- evaluation • Data will change, and processes must be able to detect that, based on data profiles • Human intervention should be limited • Tools need to be built so that non-programmers can run them • Reading logs, monitoring error reports, checking results, writing specs, can/should be done by data specialists (a.k.a. catalogers w/training) • Looking for opportunities for programmers and catalogers to learn together is essential 13 ALA Dallas, 1/20/12
  • 14.
    14 ALA Dallas, 1/20/12
  • 15.
    Re-distribution Plan • Ifwe improve data, we need to expose how we did it (and what we did), for the use of downstream consumers • New metadata provenance efforts designed to do this at the statement level • This strategy can only exist successfully where open licenses allow innovation and wide re-use • Ideally, distribution AND redistribution should be accomplished with Application Profiles 15 ALA Dallas, 1/20/12
  • 16.
    Will This ShiftCost Too Much? • It’s the human effort that costs us • Cost of traditional cataloging is far too high, for increasingly dubious value • Our current investments have reached the end of their usefulness • All the possible efficiencies for traditional cataloging have already been accomplished • Waiting for leadership from the big players costs us valuable time with no guarantees of results • We need to figure out how to invest in more distributed innovation and focused collaboration 16 ALA Dallas, 1/20/12
  • 17.
    ROI in theLOD World • Free metadata is essential in a ‘culture economy’ • We need eyeballs, attention, connection for our content! • Thinking about ROI based on recovering the cost of creating metadata is a dead end • To drive people to your content, you need to put your data out there • But once it’s there, it’s out of your control, and we need to get comfortable with that 17 ALA Dallas, 1/20/12
  • 18.
    Thank you! Questions? Contactinfo: metadata.maven @gmail.com Metadata Matters: http://managemet adata.com/blog ALA Dallas, 1/20/12 18

Editor's Notes

  • #17 (a la the open source OPAC replacements)