Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Metadata Training for Staff and Librarians for the New Data Environment


Published on

Presented at the 2011 DLF Forum in Baltimore Maryland.

Presented at the 2011 DLF Forum in Baltimore Maryland.

Published in: Education, Technology
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • No absolutes with metadata – all relative to what you need to do with your data. Can be very different for different applications.
  • Data facilitates information gathering. It can highlight information, as in the use of facets, or topic maps. These are connections that cannot be done manually in a reasonable amount of time, and therefore are connections that users do not see without the help of machines.
  • It is the opposite of the closed world assumption, which holds that any statement that is not known to be true is false.
  • Transcript

    • 1.  
    • 2. Today’s Task
      • Part 1: Audiences, current training strategies, cost-effectiveness
      • Part 2: A taste of the training
        • “ From Metadata to a Web of Data”
      • Part 3: Structured feedback session
        • Can you help us make this better?
      DLF Forum, Nov. 2, 2011
    • 3. Why Are We Doing This?
      • Increasing frustration with webinars
        • Not particularly good for anything but introductions
        • Very few opportunities for interaction or follow-up
      • One day seminars at various institutions and conferences also seems limited in terms of participation
      • ‘ Older’ model of repeatable workshops (with a group of trainers) is still useful if tweaked
        • Better opportunities for participation and learning
      DLF Forum, Nov. 2, 2011
    • 4. Goals
      • Offer direct training for libraries in a format that encourages participatory learning
        • Building on the successful library workshop model is one option
      • Encourage other library organizations and conference planners to include training options in their regular meetings
        • Generally requires members to lobby for workshops, pre-conferences, etc.
      DLF Forum, Nov. 2, 2011
    • 5. Part I: Intro to Metadata
      • Questions:
        • Do we have a shared understanding of metadata
        • What are some of the practical definitions and modes of thinking that you can use in practice?
        • What is the basis for understanding the technology context of today’s data?
      DLF Forum, Nov. 2, 2011
    • 6. Intro to Metadata
      • What is metadata?
        • not: data about data
      • Instead: Data with a purpose
        • constructed (human-made, artificial)
        • constructive (designed for a purpose, not theoretical)
        • computable (all metadata today will be used by computer applications as well as managed and understood by humans)
      DLF Forum, Nov. 2, 2011
    • 7. Exercise 1: Data With a Purpose
      • Each group has a book on the table. What metadata is needed for:
        • A warehouse that will ship books to bookstores
        • A brick-and-mortar bookstore that orders books, displays and sells them
        • An online bookstore that will take orders and ship books to customers
      • Look over your lists—it will cost you $1 for every metadata field you create. If you use this field in your operation, you get back the $1
        • Have you changed your mind?
      DLF Forum, Nov. 2, 2011
    • 8.  
    • 9. Part II: Understanding DATA
      • Goals:
        • Understand the difference between data and text by thinking about computability
        • Learn some basic data types
        • Recognize data types in library data
      DLF Forum, Nov. 2, 2011
    • 10. Standard Data Types
      • Text – ‘text’ (we know this one!)
      • Defined data types:
        • Date (& time)
        • Currency
        • Numbers (integers, etc.)
      • Controlled lists: finite sets of values to use
        • Languages (ISO)
        • Countries (ISO)
      DLF Forum, Nov. 2, 2011
    • 11. Why Data?
      • Enables machine processing of amounts of data too large for humans to grasp (which is just about all of our information)
        • processing across patron files, or bibliographic database
        • processing on retrieved sets (e.g. extracting facets)
      • Enables libraries to move beyond ‘artisanal metadata’ towards more efficient and cost-effective assignment of tasks to humans and machines
        • Comes with new sources of data and new collaborations
      DLF Forum, Nov. 2, 2011
    • 12. Data Use Examples
      • Making decisions
        • If user for more than 5 years, then …  
        • If book height greater than x, then …
      • Making connections
        • These books have the same author
        • These books have the same (or similar topic)
        • These CDs have the same orchestra
        • This place of publication has lat/long info and can be located on a map
      DLF Forum, Nov. 2, 2011
    • 13.  
    • 14. Things: What Your Metadata Talks About
      • Book
      • Author
      • Place
      • Person (in subject)
      • Historical period
      • All of these exist outside your metadata, and are independent of it
        • You can talk about these ‘things’ in many different contexts
      • If you assign them identifiers that can be shared with others, then you have a ‘thing’ or entity
        • Things become points of connection between metadata descriptions (e.g., all books by the same author)
      DLF Forum, Nov. 2, 2011
    • 15. Strings: Limited Connections
      • Metadata statements using strings don’t represent (to machines) something outside the metadata
        • They aren’t linkable to other things or strings
        • They often can’t be effectively parsed by machines
      • Transcribed data in traditional library metadata is often ‘strings’
        • Titles are good examples
      • Some strings are intended to identify something else (controlled author names, for instance) but may be used for display as well
      DLF Forum, Nov. 2, 2011
    • 16. Exercise 2: Things & Strings
      • Start with a simple file
      • Each group has a ‘record’ (BBC, etc.—not MARC)
      • A general description is provided of the purpose of the data
      • Tasks:
        • Pick out the strings and things in your example
        • Bonus points: any data types?
        • Reporting by groups and discussion
      DLF Forum, Nov. 2, 2011
    • 17.  
    • 18. Identifiers
      • Uniquely identify a variety of resources
        • On the web they use http and domain names
      • Advantages
        • Language independent
        • Display independent
        • Unambiguous
      • Usage should be oriented towards machines, hidden from humans
        • Humans have different requirements
      DLF Forum, Nov. 2, 2011
    • 19. Identifiers: What They Identify
      • Easier to attach an identifier than understand what it actually identifies
        • ISBN – identifies publisher’s product
        • LCCN – identifies LC-created metadata; = ISBN even though may have very similar metadata to publisher’s
        • DOI – identifies item in DOI system, but may link to a general sales page
      DLF Forum, Nov. 2, 2011
    • 20. Identifiers must …
      • Be unique within a domain (private db; web)
      • Be consistent (identifier must always ID the same thing; DO NOT RE-USE!)
      • Be persistent (must live as long as thing it identifies)
      • Be in a standard format
      DLF Forum, Nov. 2, 2011
    • 21. Note on “Consistent”
      • The same thing may have more than one identifier – this happens naturally in the creation of metadata. It ’s not a huge problem as long as you have a way of saying that:
      • A = B
      • … so that you can bring together the identifiers for the same thing. (cf. VIAF; also xISBN)
      • This is the basis for mapping between vocabularies so that metadata can be more easily re-used
      DLF Forum, Nov. 2, 2011
    • 22. Identifier Readability
      • Opaque: no meaning to the identifier, ex.: LCCN example (just a number)
      • Readable: makes sense to a human, ex.: Wikipedia page IDs (include page name or partial page name)
      • Can be both: system can add readable bit to opaque identifier, ex.: Open Library thing IDs
      • Choices here are controversial, and have a big impact on multilingual efforts
      DLF Forum, Nov. 2, 2011
    • 23.  
    • 24. The Open World Assumption
      • “ The open world assumption (OWA) is used in knowledge representation to codify the informal notion that in general no single agent or observer has complete knowledge, and therefore cannot make the closed world assumption.”
      • --Wikipedia
      DLF Forum, Nov. 2, 2011
    • 25. Things with relationships to other things Thing Thing Relationship DLF Forum, Nov. 2, 2011
    • 26. Things with relationships to other things Thing Thing Relationship Subject Predicate (verb) Object DLF Forum, Nov. 2, 2011
    • 27. object can be URI or "string" URI is a thing some examples: book -- has author – [lcname#] book -- has author -- "John Doe" Subject and Predicate Must be URIs DLF Forum, Nov. 2, 2011
    • 28. [diagram that shows this -- i have a slide]
    • 29.  
    • 30. Triples or Graphs?
      • Machines work with triples
        • Statements about the same thing have the same subject
      • Graphs are easier for humans to understand
        • In libraries we’re not used to visualizing data as graphs
        • More used to databases, files, hierarches
      • Making this new world work for us is as much about changing how we think as it is changing what we do
      DLF Forum, Nov. 2, 2011
    • 31. DLF Forum, Nov. 2, 2011 “ Graph Thinking” Graph relationships are different than tree relationships …
    • 32. Exercise 3: Statements
      • Present a set of triples and ask participants to turn them into sentences
        • Ex.: Book has title ‘Moby Dick’
        • Ex.: Book has author [lcna] or ‘Herman Melville’
        • Ex.: Author has death date XXXX
      • Suggest participants try drawing graphs to represent statements with the same subject
      • Suggest that participants represent how ‘strings’ create dead ends and ‘things’ can be linked
      DLF Forum, Nov. 2, 2011
    • 33. Exercise 4: Statements
      • Give each group a web page with a description
      • Ask them to organize the data as statements
      • See if the site you are using has data for persons, subjects or places
      • Discussion
        • How hard was it to find the ‘things’?
        • Did you always have the predicates you needed?
        • How different is this from today’s metadata?
      DLF Forum, Nov. 2, 2011
    • 34.  
    • 35. Properties and Classes
      • Record-based metadata is often in the form of ‘records’, using elements from only one schema
      • Statement-based metadata is often more flexible
        • Proper declaration, definition and management of the elements is very important
        • Mix and match is part of the value
      • Some current schemas might find the transition to from records to statements more challenging
        • Especially where the definition of the property depends on its place in a hierarchy (MODS and ONIX for example)
      DLF Forum, Nov. 2, 2011
    • 36. Hierarchy (top-down organization) A  Military Assets  Dogs ≠ B  Pets  Dogs DLF Forum, Nov. 2, 2011 A B Military assets Pets Guns Dogs Cats Dogs
    • 37. Caveats
      • Unless … there is a definition of dog and it can be used in either hierarchy
      • But if the meaning is defined by the hierarchy, the hierarchy is part of its meaning
      DLF Forum, Nov. 2, 2011
    • 38. Bottom-up organization “ Dogs” has meaning on its own, and can be used in multiple contexts. DLF Forum, Nov. 2, 2011 Dogs Military assets Pets
    • 39. Exercise 6: Mix & Match
      • Each group is assigned an entity to describe in metadata
      • Around the room are poster-sized depictions of various vocabularies and their definitions
      • Groups are instructed to study their task, determine what elements they need, then get up and look at the posters
        • Getting up and contemplating the posters encourages conversation!
        • Discussion: How do you decide what’s fit for purpose?
      DLF Forum, Nov. 2, 2011
    • 40. Overview of Training Plan DLF Forum, Nov. 2, 2011
    • 41. Feedback
      • Important questions as we continue to build this program
        • Does the program plan seem useful? If not, what’s missing?
        • Does the content of the session seem at an appropriate level? What could be improved?
      • What advice can you give about bringing this program to libraries?
        • Is there a place for F2F training in your budgets?
        • Would you pay for personalized online training for staff or local trainers?
      DLF Forum, Nov. 2, 2011
    • 42.
      • Slide Credits:
      • Karen Coyle
      • Diane Hillmann
      • Contact info: [email_address]
      • Metadata Matters:
      DLF Forum, Nov. 2, 2011