What Publishers Need to Know About Web Scale Discovery


Published on

By Jay Henry
Presented at SSP Annual Conference, June 2013

Published in: Business, Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good afternoon, my name is Jay Henry and I’m with Ringgold – we are a data services company with offices in Portland and near Oxford UK. Our business has two main areas of focus- working with publishers to normalize (clean) and uniquely identify their internal data and on another side of the business we provide metadata creation and dissemination services. [may want to mention Book News] Today I will be speaking as an advocate for excellent metadata, and while I believe everything worth creating is worth thoroughly describing, for the purposes of this talk I will be focusing my comments on scholarly monographs and basic data elements common to all areas of publishing. The content of this presentation is meant to inform the initiated and educate those new to the concept of “Web Scale Discovery Services”, but my focus on metadata should apply to any publishing strategy regardless of the downstream target of your data. I will touch on how the emergence of this technology is enabling new types of acquisition models, highlight the challenges to publishers, and provide some practical information for you to consider when deciding how to approach metadata creation. Please understand, I will be speaking about only a small portion of the supply chain directly related to exposure and discovery of content. Specifically, the linkage between Publishers & their contributors, intermediaries, libraries and their patrons, and the effect and importance of WSDs in that context. Of course, the benefits of well-formed metadata are so profound as to provide a direct benefit to scholarship… I won’t go down the road of making a philosophical argument that publisher’s have a moral obligation to strive for the highest standards, but you can see how I’m thinking about this topic. Let’s just not forget that good quality metadata has a positive effect in many areas of the supply chain, and natural efficiencies are the result; that should be reason enough to attempt to stay awake at this awkward hour for consciousness. I will be making the case that the emergence of WSD in conjunction with new acquisition models represents a real change in the supply chain which requires attention from publishers to ensure they are doing everything possible to ensure their content will be in the best possible position to be discovered.
  • There are a lot of terms tossed around when we talk about search [read terms] – read next A quick clarification on definitions - Are we hearing different names for the same things? No. The term Web Scale or “discovery services” is being used throughout the publishing industry as the most recent darling buzzword…and for good reason. Web search utilities (Google, Bing, etc.) have transformed library patron and researcher behavior. “Search” is maturing as a concept and taking on new dimensions within libraries as they strive to compete with mainstream search services. Define WebScale as the next step in focused, de-cluttered, search capability that provides visibility to resources beyond the library, and puts more power in the hands of patrons to influence purchasing– not only through DDA, but by their behavior and the extent to which they interact with content (circ stats as a means of judging/vetting quality/utility  often the same– and then making purchase/renewal decisions. About discovery – and systems like these… this is what exposes content, not catalogues or flyers or special promotional emails…what sells and circulates content is putting the right information in front of the right consumer and enabling access– users (especially librarian buyers) will spend vast amounts of their time with a handful of familiar tools—presenting the right data within those tools should be a top priority for every publisher.
  • The term “discovery services” is being used throughout the publishing industry as the most recent darling buzzword…and for good reason. Web search utilities (Google, Bing, etc.) have transformed library patron and researcher behavior. “Search” is maturing as a concept and taking on new dimensions within libraries as they strive to compete with mainstream search services.
  • More of a “game changer” I believe WSD services represent a truly mature search technology for libraries that will provide benefits to users and the libraries themselves by allowing non-owned resources to be part of the central index. DDA Emerging as an important new way to present title information to patrons This model delivers what patrons want – and users have driven adoption of change more than any other factor The proliferation of WSD goes beyond the main players that I mentioned earlier; some system vendors (of current ILS installations within libraries) have begun to integrate WSD services by partnership and technology integration.
  • Re: web search – the ability to search across the web changed user behavior and their expectations– federated search has been trying to delivery a similar experience to users, but only now is there the potential to delivery a vastly improved, yet focused, search for academic research. Non-linear lending– might want to mention ProQuest/EBL/Ebrary as innovators in experimenting with new acquisition models;
  • Complexity can be managed by systems—in fact, whenever a need arises, a solution appears; however, the best solutions can not work with poor quality data—the old cliché of ‘garbage in garbage out’ still applies. There is more content to describe than ever, and as a result, unique identifiers are the best way to disambiguate and link your data to relevant sources. metadata has been cooped up for a while, and is not feeling it’s old strength I’m here to talk about the importance of good quality metadata (and what is meant by “good quality”) in the context of web scale discovery systems not because the term is the flavor of the month, but because they matter—this is an important trend that I believe will become the standard model not only within academic institutions, but everywhere. ---COUNTER DATA---???? ---Some publishers are better than others– there is a range, and those doing the best job tend to be the largest and most well recognized brands which increases their ability to ensure their content is discovered; more than ever, descriptive data is a competitive factor
  • WSDS – the importance of complete metadata in order to support systems no one really understands The only solution is to provide as much data as possible in order to provide the broadest description possible to provide the algorithms at work the raw material that will ultimately produce hits and increased visibility
  • Publishers must drive the creation and initial proliferation of complete and high quality metadata: Reference -Nielsen study Publishers are the first, and should be the best, source of metadata for a title. Still, much of what can and will be added as part of a ‘description’ of a work will be created after the thing is actually published, and so, metadata grows within the supply chain over time; those records that have a strong start will be the most utilized and afford the greatest benefit to the publisher In my introduction I used the term, ‘Publisher as Parent”… one thing a good parent provides for it’s newly created work is a unique name; in the case of monographs it is possible and advantageous to uniquely identify not only the work and it’s various manifestations, but also content within the work; deep linking content and expanding the descriptive data associated with each discrete chunk (e.g. chapters) provides an excellent start to a young work’s descriptive foundation. Ultimately, publishers benefit from looking at meta-metadata…metrics that allow publishers to evaluate their publishing strategies and focus on areas where they experience greater success or can see trends in user behavior. Just as important, content will increasingly be judged based on usage– the same data that exposes titles for purchase drives ongoing circulation and renewals.
  • I’ve listed the bare minimum here; the BIC bibliographic standard is a good list of what should be supplied, but of course, more is better, always always always. The important thing to remember about creating good quality metadata is to adhere to standards and uniquely identify everything possible.
  • Let’s get specific about what kind of metadata is worthy of adjectives like, ‘better’, or ‘complete’– Unique Identifiers allow content to be disambiguation, internal, external, etc… standards grease the wheels of the supply chain
  • In my introduction I used the term, ‘Publisher as Parent”… one thing a good parent provides for it’s newly created work is a unique name; in the case of monographs it is possible and advantageous to uniquely identify not only the work and it’s various manifestations, but also content within the work; deep linking content and expanding the descriptive data associated with each discrete chunk (e.g. chapters) provides an excellent start to a young work’s descriptive foundation. If we take a few of the participants, apply standard identifiers and adopt a data distribution policy that spreads and enhances the initial record, and things begin to change.
  • Highlight slide
  • Highlight slide
  • Highlight slide – metadata combined with standard identifiers CHANGE the supply chain...merging at an ever increasing rate and the flow of information across systems will be key to exposing content and realizing sales and use of works.
  • After everything I’ve said to this point, this slide is really a summary of what I’ve already advocated – more is better, wide distribution, standards, unique identification and a policy to create consistent descriptive output… from that strategic foundation, more can be done, but this is the minimum. What do I mean by complete? –deep, chapters, summaries of chapters, links to chapters, images, etc. Efficient distribution means pointing data in directions which “trickle out” and which leads to further enrichment of the description, including the addition of user generated content/reviews etc. Powerful new tools are now widely available to create clear metrics that provide the basis for better informed decisions by institutional purchasers.
  • Re: apply unique ids – “once uniquely identified, always uniquely identified”Each format through which you publish your book requires its own ISBNbecause this thirteen-digit numeral unmistakably identifies the title, edition, binding, and publisher of a given work. So your paper book will have its own ISBN, the audiobook will have its own ISBN, and the ebookits own ISBN. Re: Evaluate: Many publishers have the resources to do a good job and are doing so, others simply don’t have the resources to put a complete plan together and execute—nonetheless creating the best possible data for your content is critical regardless of how it’s accomplished.
  • “ Once uniquely identified, always uniquely identified”… by definition
  • If you remember nothing else, remember this!
  • What Publishers Need to Know About Web Scale Discovery

    1. 1. SSP Annual ConferenceJune 2013Jay Henry
    2. 2. IntroductionWeb Scale Discovery in brief & why it mattersMetadata – new ruler of the realmLife Cycle of Metadata – Publisher as ParentEvangelic Appeal for StandardsStrategies, Tactics & Pitfalls to Avoid
    3. 3. Many terms tossed around…Federated search, Metasearch, NextGen catalogs, discoverylayers --- and now “Web Scale Discovery Service”An improved search experience has always been themotivation behind innovation…The latest generation of tools are something different.
    4. 4. A DefinitionA pre-harvested central index coupled with a richly featureddiscovery layer providing a single search across a library’slocal, open access, and subscription collections.…but it’s more than that
    5. 5. Not Just Another SearchPDA/DDA are purchasing models that were ahead oftechnologies ability to properly accommodate. Theacquisition systems developed in conjunction with WSDrepresent a logical progression of capabilitiesPatron-driven acquisition, or PDA, is not new, but it is on therise. Approximately 400 to 600 libraries worldwide haveswitched to a patron-driven system for purchasing newworks, and that number is likely to double over the next yearand a half (2012)
    6. 6. Simple Logical Progression
    7. 7. The Players
    8. 8. Content is King?Metadata is the real ruler of the realmUsing descriptions of content to generate purchase and useis more important now than everSo, if we know what the target is, how do we create the bestpossible metadata?
    9. 9. The Black BoxThe people who know how these systems work aren’t telling
    10. 10. Lifecycle of Metadata
    11. 11. The Basics (More Is Better)TitleAuthorFormatISBNSubject categoriesImprintLink to publisher’s dedicated pagePublication DatePrice
    12. 12. Data = SalesTitles that meet the BIC Basic standard see average sales 98%higher than those that don’t meet the standardRecords with complete BIC Basic data but no image haveaverage sales…of 473% [higher] in comparison to thoserecords which have neither the complete BIC Basic dataelements or an image.The difference in average sales between records which…don’t have enhanced metadata, and records which do…haveenhanced metadata elements is on average over2,600 units,which represents an increase of almost 700%
    13. 13. Standard Identifiers… please.
    14. 14. How identifiers helpProper understanding of the customer, whether author,reader or institutionProvides a simple basis for wider data governance:Data governance, as defined at Ringgold, is the processes,policies, standards, organization, and technologies requiredto manage and ensure the availability, accessibility, quality,consistency, auditability, and security of data.
    15. 15. The supply chainConsortiumAuthorSubmissionand PeerReviewSystemPublisherTechnologyPartnerSubscriptionAgent orSales AgentFulfilmentHouse orSystemLibraryDiscoveryServiceWSDsEnd UserDataSyndicationTargetsConsortiumSocietiesFundersCitation
    16. 16. The supply chainConsortiumSubmissionand PeerReviewSystemTechnologyPartnerSubscriptionAgent orSales AgentFulfilmentHouse orSystemEnd UserConsortiumSocietiesFundersCitation
    17. 17. The supply chain using identifiersConsortium
    18. 18. The supply chain using identifiersConsortium
    19. 19. The supply chain using identifiersConsortium
    20. 20. Strategy SuggestionsCreate the most complete metadata possibleDistribute widely and efficientlyAdhere to standardsUniquely describe each manifestation of a workDevelop an internal policy to create uniform data across allpublished works
    21. 21. Practical TacticsRequire Authors to establish an ORCID profileCreate links into content, the more specific the betterDevelop concise descriptions of content (not jacket copy)Include as much as practical – e.g. abstracts of chapters areoften written by the authors themselvesApply unique identifiers to establish longevity of themetadata (e.g. ORCID, ISBN, ISSN, DOIs Ringgold ID, ISNI)Evaluate the benefits of working with outside partners toassist in metadata development, application and syndication
    22. 22. Pitfalls to AvoidNon-Standardised Naming ConventionsResult: Poorly associated data in the supply chain.Example 1: Inconsistent author listings, e.g. John Smith, J Smith,Smith J etc.Solution: use ORCID numbersExample 2: Lack of affiliations between authors and institutionalcustomers.Solution: use the Ringgold or ISNI numberExample 3: Inability to link author and customer data together.Solution: use the Ringgold or ISNI number
    23. 23. Pitfalls to Avoid (continued)Lack of or Inadequate Subject Classifications and Keywords:Result: Dramatic negative effect the positioning of content inrelevancy rankings in discovery or search servicesExample 1: Applying non-standard subject classifications causes amismatch against what is expected by libraries or end-usersSolution: Understand the standards and best practices being applied by currentsystems and similar publishers; provide information in a form that will mosteasily utilized by the systems presenting your dataExample 2: DDA sales are lost because subjects were applied withoutusing an international standard resulting in poor search results amonginternational users; cross-discipline keywords lacking entirely e.g.Football in the US does not mean the same as Football in Europe.Solution: Adopt an internal policy to adhere to an accepted standard at the core ofsubject description, and then expand the description using keywords in theabstract/summary copy.
    24. 24. Pitfalls to Avoid (continued)Format and versions:Result: Confusion within sales and distribution channelsExample 1: Users fail to find a compatible format for the title theywantSolution: Apply ISBNs correctly – unique identifier for each e-editionExample 2: Citations are incorrect or inconsistentSolution: Apply version-specific pagination if appropriateExample 3: Links to content fail over timeSolution: Apply DOIs to establish a persistent and reliable linkExample 4: Data is not fully utilized/indexed by discovery systemsSolution: Output information in industry standard formats (ONIX)
    25. 25. PitfallsLack of high quality information reduces the likelihood ofcontent to be discovered.
    26. 26. References The Ins and Outs of Evaluating Web-Scale Discovery Services by Athena Hoeppnerhttp://www.infotoday.com/cilmag/apr12/Hoeppner-Web-Scale-Discovery-Services.shtml Stakeholders Strive to Define Standards for Web-Scale Discovery Systems By Michael Kelley on October 11, 2012http://www.thedigitalshift.com/2012/10/discovery/coming-into-focus-web-scale-discovery-services-face-growing-need-for-best-practices White Paper: The Link Between Metadata and Sales By Andre Breedt, Head of Publisher Account Management; DavidWalter, Research and Development Analyst, 2012http://www.isbn.nielsenbook.co.uk/uploads/3971_Nielsen_Metadata_white_paper_A4(3).pdf The BIC Basic standards for bibliographic data provisionhttp://www.bic.org.uk/17/BIC-Basic/ Web-Scale Discovery in an Academic Health Sciences Library: Development and Implementation of theEBSCO Discovery Service DOI:10.1080/02763869.2013.749111JoLinda L. Thompsona*, Kathe S. Obriga& Laura E. AbateaMedical Reference Services Quarterly Volume 32, Issue 1, 2013http://www.tandfonline.com/doi/abs/10.1080/02763869.2013.749111 Discoverability Challenges and Collaboration Opportunities within the Scholarly Communications Ecosystem:A SAGE White Paper Update by Mary M. Somerville, University of Colorado Denver;Lettie Y. Conrad, SAGE CollaborativeLibrarianship Vol 5, No 1 (2013) Affection for PDA By Steve Kolowich 2012 Inside Higher Edhttp://www.insidehighered.com/news/2012/06/20/research-foresees-demand-driven-book-acquisition-replacing-librarians-discretion#ixzz2VWOAqWoU
    27. 27. Jay HenryVice PresidentRinggold Inc.Jay.henry@ringgold.com