1 d.1


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

1 d.1

  1. 1. Jabin White Director of Strategic ContentWolters Kluwer Health – Professional & Education SSP 31st Annual Meeting Baltimore, MD | May 28, 2009
  2. 2.  What is metadata, and why should publishers care? Impact on publishers – how metadata impacts processes Case Studies – This isn’t your Daddy’s publishing business Final Thoughts, Recommendations
  3. 3.  Reading most definitions of metadata and related standards is like trying to resolve disputes with my kids As Ed said, metadata is “data about data” • But what does that mean? Its use may be increasing, but metadata is NOT new
  4. 4.  In the move from print publishing to digital, metadata is a powerful tool to help publishers get content in the right place, in the right format, and known to the right systems and people, at the right time Print books were easy • Everyone knew what they were • You could really only use them one way • They had a beginning, an end, a physical presence, and a set price (mostly)
  5. 5.  Today, computers are often communicating with one another as much as they are with users (people) Metadata becomes critical in: • B2B relationships • Enhancing B2C relationships • B2-_________ relationships Thequality of the metadata gives publishers a more powerful voice in what happens to their content
  6. 6.  For example: • A digital asset (an image) • What file format is it? • How big is the image? • Who took the picture? • Who owns the picture? • Can you use it on your web site? If you do, what credit do you have to give to the owner? • What date was it created? • Is it part of a collection? • Is it related to another piece of content?
  7. 7.  Ifa publisher’s goal is to disseminate content to the widest possible audience, metadata is critical
  8. 8.  Again, in books you had one use model Metadata allows publishers to have diverse relationships with content consumers and other information providers • Customers (duh) • Aggregators • The Open Web (not Google, but other search engines)  But don’t try to “game” the search engines with adult keywords; that’s just wrong  There have been lawsuits over use of meta keywords, including Playboy suing two adult web sites • Technology partners/developers • Systems wherein content is a “value add” • Multiple output formats
  9. 9.  HTML Metadata • <meta http-equiv="Content-Type" content="text/html; charset=iso- 8859-1"> • <meta name="verify-v1" content="kBoFGUuwppiWVWGx4Ypzkw1Cs1GgMYEMMbfNr7F Y65w=" /> For people • <meta name="description" content="International publisher of professional health information for physicians, nurses, specialized clinicians & students. Medical & nursing charts, journals, and pda software."> For search enginges • <meta name="keywords" content="springhouse, medical book, nursing journal, medical pda software, lippincott medical reference, lww, lippincott, lww com, medical publisher"> • <link rel="stylesheet" href="/css/style.css" type="text/css">
  10. 10.  Classifying Metadata  DescriptiveMetadata • ISBN (I told you this (sorry, my examples wasn’t new) are from STM) • Dewey Decimal • ICD-9 and ICD-10 System Codes • Books in • MeSH Print/CIP/Library of • SNOMED-CT Congress data • NANDA, NIC, NOC for • MARC records Nursing • DOI (Digital Object • NDC, HCPCS for drugs Identifier)
  11. 11.  Classifying Metadata  DescriptiveMetadata • ISBN (I told you this (sorry, my examples wasn’t new) are from STM) • Dewey Decimal • ICD-9 and ICD-10 System Codes • Books in • MeSH Print/CIP/Library of • SNOMED-CT Congress data • NANDA, NIC, NOC for • MARC records Nursing • DOI (Digital Object • NDC, HCPCS for drugs Identifier) • DOI (Digital Object Identifier)
  12. 12.  Usingcontrolled vocabularies, extra power can be added to content via semantic tagging to drive: • More precise searching • Contextually-based connections • Lowering of “two terms meaning the same thing” syndrome (hypertension vs. high blood pressure; heart attack vs. myocardial infarction) • Filling in of content gaps
  13. 13. How Metadata Changes Processes
  14. 14.  Impacton publishers depends on answers to questions in previous section • i.e., what am I going to get in return for investing in metadata, and is it worth it? • More and more, this is not an “if” proposition, it’s “how much” Publisherswho buy in have two basic choices on approach:
  15. 15.  Requires deeper commitment, but has bigger potential upside • Positive impact on product creation and development Requires thinking about tools, workflows, and enterprise-level systems to allow for creation and maintenance of metadata Combination of good metadata in the workflow and creativity in product development team can pay big benefits Allows participation of authors (or subject matter experts in lieu of) at the beginning of the workflow
  16. 16.  Requires lesser commitment, but potentially fewer rewards Can be done with zero impact on current systems Has benefit of content being in “final form” (whatever that means anymore) when intelligence is added in metadata Can keep SMEs as a separate offshoot of the workflow – easily outsourced Can replace all of the above with software solutions (Darrell and Chris will talk about that) 
  17. 17.  Chris, Darrell and I do NOT disagree There are justifications that can be made for tagging or entity extraction approaches (or both) Just as there is no “one size fits all” metadata, there is no ONE solution But if you must pick one, I’m right 
  18. 18.  Active vs. Passive Metadata • Active metadata  Publisher intentionally associates markup with certain pieces of content  Often using controlled vocabulary  Includes semantic indexing  Can also be machine-based, using scripts, etc. • Passive metadata  Metadata created based on use of content  Inheritance of properties from parent objects
  19. 19.  The use of active metadata usually means an impact on support tools • Re-think authoring tools to allow for capture of metadata by authors  This can be outsourced to external SMEs – help is available • Re-think content management to allow for preservation/management of metadata How deep you go depends on how big the payoff • Good semantic indexing can drive new features and functionality, but must used appropriately If you decide to add active metadata, a controlled vocabulary just became your new best friend
  20. 20. – a specific specification of a Ontology conceptualization • In English: a controlled vocabulary used to describe a group of topics Taxonomy – same as ontology, but with hierarchy implied Caveat – These two terms are so misused, their definitions no longer matter (think Content Management circa 2000)
  21. 21.  PRISM (Publishing Requirements for Industry Standard Metadata) – an XML metadata vocabulary for handling content – started out in magazines and journals, but has added other types Dublin Core – named after a 1995 workshop in Dublin, Ohio, it is, very simply, a set of 15 agreed-upon metadata elements used to describe objects • PRISM uses Dublin Core elements and then makes them specific to publishing RDF (Resource Description Format): an XML implementation that lets you richly describe relationships between data on web pages. Explain triplets
  22. 22.  Semantic Web – A web of data. Envisioned by Tim Berners-Lee, it will be a web driven by data that “talks” to other data • My kids will work on this FOAF Project (Friend of a Friend): Uses RDF to describe people and their preferences to the web, so you can find people with similar interests; all about social networking SPARQL (Simple Protocol and RDF Query Language) – once you have used RDF to describe resources and their connection points, you use SPARQL to ask questions about those connections and find stuff OWL (Web Ontology Language) – extends ability of RDF and XML Schemas to describe information
  23. 23.  Drug Reference Product Perfect, structured information that is a great example of metadata becoming just as important as content Examples of things that were stored in metadata: • Codes, codes, and more codes • Drug interaction information • Classifications (this one was actually redundant) • Formulary information • FDA approval date (could also be redundant)
  24. 24.  Four editors spent as much time working on metadata as they did on content itself All work on import/export from DB was done by: • Acting on metadata • Keeping metadata at top of priority list on output • “Output all drugs anticoagulants that were approved before 1982”
  25. 25.  Medical content (5 years ago I would have said “book”) Thousands of topics, sometimes printed, always updated, sent to web, handhelds How/when they are updated, whether or not they are printed, and whether or not they get extracted is all driven by …. Metadata!
  26. 26.  Extracts all are made by acting on metadata • What is the subject area of the topic? (this can be a MANY to ONE relationship) • When was the topic last updated? • Who was the author of the last update?
  27. 27. ID Values assigned during XML conversion
  28. 28. Gender values assigned by authors
  29. 29.  Have a metadata strategy • Business case should support investment in metadata • Be careful, and stay alert for mission creep – this stuff can get out of control very easily Know your organization • Is it a change tolerant organization? “All in” vs. measured, incremental approach should be considered • Show me someone who says they have the correct universal approach to metadata, and I’ll show you a liar
  30. 30. A little bit of metadata understanding by product development people can go a long way If a content set can benefit from metadata in the creation of new products, that could justify investment in metadata strategy and tools within the workflow
  31. 31. Jabin WhiteJabin.white@wolterskluwer.com
  32. 32. 1. Contributor 9. Publisher2. Coverage 10. Relation3. Creator 11. Rights4. Date 12. Source5. Description 13. Subject6. Format 14. Title7. Identifier 15. Type8. Language Return