Virtusales ebook metadata


Published on

Presentation to BiblioLite users at Virtusales offices in Hove on 3rd November 2011. Emily Gibson of Corbas Consulting Ltd.

Published in: Technology, Education, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • There are a lot of things about metadata that I don't know. And there's a good reason for this: most production artists limit their involvement to the metadata within the EPUB file. Every person involved in the process knows their little niche best. However, as systems begin to reflect the importance of metadata across all aspects of the publishing process, metadata systems need to be integrated and streamlined - so we have a single source for all content. As with all aspects of digital publishing, knowledge integration needs to work across departments.
  • PDF HTML fileEPUBProprietary format (Kindle)App (e.g. iPad/phone, Android)Text file (Project Gutenberg)Etc. Problem is with multiple versions of the book … you want to link them but currently there’s no way, especially if you are working and archiving within a single source to multiple platform output workflow e.g. xml first How do you identify a book with its multiple manifestations? DOI, ISTC DOI – any digital entry, resolves to a permanent URI The ISTC provides sales analysis systems, retail websites, library catalogues and other bibliographic systems with a method of automatically linking together publications of the “same content” and/or “related content”, thus improving discoverability of products and efficiencies. An ISTC number is the link between a user’s search for a piece of content and the ultimate sale or loan of a publication.
  • Consists of three specifications: Open Container Format (OCF) - describes the on-disk formatOpen Packaging Format (OPF) - describes the metadata and TOCOpen Publication Specification (OPS) - describes the content formats
  • An EPUB contains the following files: - mimetype file: a single text file containing a single line “application/epub+zip” (missing in the image above) - META-INF folder: contains a single container.xml file; if you embed fonts or use DRM an encryption.xml file will also sit here - OPS folder: contains all the xhtml files for the book, a CSS file for the styles and image files if required - XHTML files (a stricter variant of HTML) - one file for each chapter (forces page breaks) and one for the cover - uses UTF-8 encoding - toc.ncx file: for navigational TOC, includes play order; an XHTML file - package.opf file: simple text file that describes the contents of the book; consists of metadata, manifest, spine and guide
  • Multiple Creator elements Source is the original isbn for the print book
  • It's a way of organising information: think of a library catalogue."The association of standardized descriptive metadata with networked objects has the potential for substantially improving resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and allowing access to the surrogate content that is distinct from access to the content of the resource itself." (Weibel and Lagoze, 1997) Quality of metadata is a concern
  • "deliver rich product information into the supply chain in a standard form, to wholesalers and distributors, to larger retailers, to data aggregators, and to affiliate companies""a template for the content and structure of a product record, ONIX has helped to stimulate the introduction of better internal information systems, capable of bringing together all the “metadata” needed for the description and promotion of new and backlist titles. The same core data can also be used to produce advance information sheets, catalogues and other promotional material." automates: (publisher not retailer control; lower risk of error; time saving)
  • - this is likely the content of your database - three main standards: ONIX for Books, ONIX for Serials and ONIX for Publications Licenses (licenses for libraries' digital resources) - “not a data model” means that it’s not meant to be a plan for the architecture of your content - ONIX for Books is in version 3.0, which is not backwards compatible with previous versions of ONIX
  • Remember: not a data model not enough for discoverability
  • "Dublin Core has as its goals: Simplicity of creation and maintenance The Dublin Core element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the networked environment. Commonly understood semantics Discovery of information across the vast commons of the Internet is hindered by differences in terminology and descriptive practices from one field of knowledge to the next. The Dublin Core can help the "digital tourist" -- a non-specialist searcher -- find his or her way by supporting a common set of elements, the semantics of which are universally understood and supported. For example, scientists concerned with locating articles by a particular author, and art scholars interested in works by a particular artist, can agree on the importance of a "creator" element. Such convergence on a common, if slightly more generic, element set increases the visibility and accessibility of all resources, both within a given discipline and beyond. International scope The Dublin Core Element Set was originally developed in English, but versions are being created in many other languages, including Finnish, Norwegian, Thai, Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish. The DCMI Localization and Internationalization Special Interest Group is coordinating efforts to link these versions in a distributed registry. Although the technical challenges of internationalization on the World Wide Web have not been directly addressed by the Dublin Core development community, the involvement of representatives from virtually every continent has ensured that the development of the standard considers the multilingual and multicultural nature of the electronic information universe. Extensibility While balancing the needs for simplicity in describing digital resources with the need for precise retrieval, Dublin Core developers have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery needs. It is expected that other communities of metadata experts will create and administer additional metadata sets, specialized to the needs of their communities. Metadata elements from these sets could be used in conjunction with Dublin Core metadata to meet the need for interoperabilbility. The DCMI Usage Board is presently working on a model for accomplishing this in the context of "application profiles."
  • These can be used inside an epub. Of these 3 are required: identifier, title and author. Is this good enough for discoverability? What’s missing? Genre for e.g.? Subject area? There is a freeform field called meta that has two attributes: name and content, you could use this for additional data but it’s not consistent enough to reliably use for discovery
  • very rich relationships in metadata now - can express information about relationships for example link element that allows you to reference metadata by url e.g. DOI or URL for the ONIX feed meta in epub 3 is 1 identifier element, title element, language element, plus a meta element giving the last modified date.allows you to express the vocabulary (e.g onix, dublin core, etc) and a relationship to another metadata element as well as the content and namethe example they give is that you could use meta to identify that a piece of content is a video clip and then refine it with another meta tag to give the durationgood example from the spec: <dc:creator id="creator">Haruki Murakami</dc:creator> <meta refines="#creator" property="role" scheme="marc:relators" id="role">aut</meta>that says that Haruki Murakami is the creator and the refinement says that the he's the authorthat's the new way to do <dc:creator opf:role='aut'>Haruki Murakami</dc:creator> btwbut the basis is that you can express relationships in the metadataone good thing - the new metadata standard should stop that horrible problem with having to create a dummy author with the names of all of the authorsanother example of useful metadata (for this audience): <link rel="onix-record" href=" "/>
  • The metadata database for these books is all keyed individually, and for this particular website, but what about putting a genre field into your epub to make your content more discoverable?
  • Virtusales ebook metadata

    1. 1. E-book metadata Presented by Emily Gibson on 3rd November 2011 [email_address]
    2. 2. What is an e-book? <ul><li>A distributable digital version of a book </li></ul>
    3. 3. An industry standard Maintained by the International Digital Publishers Forum (IDPF) What is ... Zip file consisting of: OCF OPF OPS For more information, see
    4. 4. Inside an EPUB ... and so on, one html file for each chapter ...
    5. 5. Inside OPF file (metadata)
    6. 6. Metadata <ul><li>Often defined as ‘data about data’ </li></ul><ul><li>A metadata record is a set of attributes or elements necessary to describe the resource </li></ul><ul><li>Metadata predates digital publishing, but it’s become crucial for digital information retrieval </li></ul><ul><li>E-book metadata involves both the data embedded in the EPUB file and the data that is held separate from the book (e.g., library catalogue, ONIX feeds for retail venders, etc.) </li></ul>
    7. 7. Why is metadata important? <ul><li>discoverability </li></ul><ul><li>delivers info needed to supply chain, incl. online retail </li></ul><ul><li>feeds internal information systems </li></ul><ul><li>automates loading product information to customer facing systems </li></ul><ul><ul><li>puts publisher, not retailer in control </li></ul></ul><ul><ul><li>lower risk of error </li></ul></ul><ul><ul><li>time and labour saving </li></ul></ul>
    8. 8. ONIX <ul><li>ONline Information eXchange </li></ul><ul><li>an international standard </li></ul><ul><li>an XML file </li></ul><ul><li>a communications protocol (i.e. computer to computer communication), not a data model </li></ul><ul><li>captures metadata </li></ul><ul><li>managed by an international steering committee of user group representatives </li></ul><ul><li>For more information see </li></ul>
    9. 9. ONIX fields
    10. 10. Dublin Core <ul><li>addresses the need to standardise descriptive metadata </li></ul><ul><li>describes books, digital materials, and composite media like webpages </li></ul><ul><li>an ISO standard maintained by the Dublin Core Metadata Initiative (DCMI) </li></ul><ul><li>its goals are: </li></ul><ul><ul><li>simplicity of creation and maintenance </li></ul></ul><ul><ul><li>commonly understood semantics </li></ul></ul><ul><ul><li>international scope </li></ul></ul><ul><ul><li>extensibility </li></ul></ul><ul><li>for more information see </li></ul>
    11. 11. Dublin Core elements Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements:
    12. 12. EPUB 2 and EPUB 3 <ul><li>as yet, no hardware support for EPUB3 </li></ul><ul><li>metadata in EPUB 3 much better </li></ul><ul><li>publishers will be able to stop sending a separate metadata file to suppliers </li></ul><ul><li>only one store of metadata means half the scope for errors </li></ul><ul><li>the meta element is greatly enhanced </li></ul><ul><li>link element that allows you to reference metadata by url </li></ul>
    13. 13. Linking books via metadata
    14. 14. Summary <ul><li>metadata can be internal (OPF) and external (ONIX) </li></ul><ul><li>important for discoverability, tracking and automation </li></ul><ul><li>most retailers request ONIX feeds </li></ul><ul><li>EPUB standard is Dublin Core </li></ul><ul><li>EPUB3 metadata is more useful than EPUB2 and may make separate ONIX feeds redundant </li></ul><ul><li>future metadata uses will augment discoverability using genre and subject fields, for e.g. </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.