BookServer: A Web of Books


Published on

Description of the origins and development of the BookServer architecture and the Open Publication Distribution System (OPDS). Why OPDS Catalogs can help build a web of books. Discussion of the challenges ahead.

Published in: Business

BookServer: A Web of Books

  1. 1. BookServer: A web of books Peter Brantley Internet Archive NISO . BNC . 2010
  2. 2. I. Opportunity and Vision
  3. 3. Motivating issues Entering the digital fold, a tangled landscape: 1. finding the book 2. format of the book 3. acquiring the book
  4. 4. Finding the book  Web search? (Google, Bing, etc)  Publisher website? ( ... )  The local library? (borrowing/lending)  Online bookstore? (Amazon, Indigo, B&N)  Indie bookstore? (Vroman’s, Powell’s)  Alt. vendor? (Smashwords, Kobo)
  5. 5. Format of the book  Highly structured display (pdf)  Downloadable book package (epub, mobi)  Web- or “cloud”-based (Google Editions)  Non-standard enhanced book (Blio)  Not really available at all (ill)
  6. 6. Acquiring the book Reading systems –  Amazon Kindles, Sony Readers, B&N nook  IBIS Reader, Aldiko, Stanza, Kobo  Standard desktops and laptops  Game consoles (Wii)  Apple iPad
  7. 7. “ripping hair out” + Device + Format + Discovery + Acquisition + Installation ( + DRM ) = Confusion.
  8. 8. What readers want What readers want to have .. Be able to find the books they want, in the formats that they can use, for the device that they have, and not have it be painful.
  9. 9. Book distributors What publishers, libraries, bookstores want - Make books available for discovery, with accurate descriptive information, at as many different places as possible, under the sales / use terms permitted.
  10. 10. Even the Feds
  11. 11. For the United States Even the U.S. Dept of Justice is an advocate: “[book] data provided should be available in multiple, standard, open formats supported by a wide variety of different applications, devices, and screens.”
  12. 12. Wanted: Web of Books
  13. 13. BookServer: A future for books Creating a new architecture using common, open standards that permits people to find, buy, acquire, and read books from any source, on any device, using many different ebook applications.
  14. 14. The Heritage: Lexcyle’s Stanza
  15. 15. Relation: Library catalogs Library 2.0 Gang (02/09): Google books and libraries “Open Catalogue Crawling Protocol” Google, DLF, Talis, and others Atom vs Sitemap discussions
  16. 16. Stages of support IDPF Board Tools of Change (NYC, Feb 2009) Web Expo 2.0 (SF, Apr 2009)
  17. 17. OPDS “Catalog” launch “The Open Publication Distribution System (OPDS) is a generalization of the Atom [XML] approach used by Stanza's online catalog. ... I believe this effort has the potential to be a critical enabler to the growth in access to, and adoption of, digital books.” - Bill McCoy, Adobe, 04.09
  18. 18. Getting the terms right 1. “BookServer” is the architecture. 2. “OPDS” is the technical specification. 3. “Catalogs” are made using OPDS. 4. “Atom” is the XML scheme for OPDS.
  19. 19. Based on Atom Because OPDS is based on a commonly used XML standard, called Atom, OPDS Catalogs can be read by –  web browsers  news readers (rss)  mobile applications
  20. 20. Catalogs scale Because Catalogs are easy to make –  Any web site can run a bookstore/library.  Libraries, bookstores, publishers can play.  Search engines can serve as book gateways.  Aggregators can harvest multiple catalogs.
  21. 21. Distribution format Because Catalogs contain simple data describing books and their availability – Catalogs can also be used for B2B, to distribute data to partners for “harvest” instead of using complicated standards. (Future: “real time web” notifications.)
  22. 22. Journals good to go
  23. 23. Delivering article level
  24. 24. What’s in this thing? Catalogs provide manifests –  List of the titles available  Information about each title  Formats the title is available in  Ways the title can be acquired
  25. 25. How it works A reader ... 1. Browses a Catalog of titles - 2. selects a title for more information - 3. makes a purchase/borrow decision - 4. obtains book (PayPal, Amazon, Google) - 5. installs and reads the book.
  26. 26. A good catalog ... For best user experience:  Intelligent hierarchy  Flexible search  Extensive faceting  Human touch
  27. 27. Made easily Catalogs can be derived from basic bibliographic metadata. Such as: ONIX, MARC, (ahem) spreadsheets (Internally OPDS Catalogs use simple Dublin Core metadata.)
  28. 28. Why not ONIX? ONIX (and BISG “BookDROP”) are:  Designed for a different use cases  Complex standard with many options  Not widely used beyond publishing  Not understood by web browsers  Established; change is difficult
  29. 29. Catalogs are emergent Because we use open standards for describing data, it is possible to link bibliographic book data more easily.
  30. 30. Linking books Catalogs can tie together – § Book reviews § Reading lists § Annotations § Fan fiction
  31. 31. Make Books Apparent A workshop sponsored by the Internet Archive October 19-20, Fort Mason, San Francisco, CA With the assistance (among many others):  O’Reilly Media  Threepress  Feedbooks  Book Oven
  32. 32.  Adobe  Ingram Digital  Aldiko  Inkmesh  (Amazon) Lexcycle  O’Reilly Media  Applewood Books  OLPC  Book Oven  Pixel Qi  Feedbooks  Kobo Books  Floss Manuals  Threepress  HumanWare  Voyager Japan Interested parties (03.2010)
  33. 33. Part II: Meeting the challenge
  34. 34. Building the ecosystem For this to work, we need: 1. Good (independent!) reading systems 2. Books, journals, magazines, and more 3. $ Publishers must contribute frontlist Revenue nexus: 4. Mobile reading systems 5. Aggregators (incl. search)
  35. 35. We’re in draft
  36. 36. We have issues!
  37. 37. Issues – I Aggregation Two roles for OPDS: 1. simple publication 2. catalog aggregation Aggregating resembles metasearch: out of many sources must come order.
  38. 38. Issues – II Metadata Matching title <> reader is not trivial. FRBR, recommending, clustering - and then there is plain old GIGO
  39. 39. Issues – III Identifiers OMG. Where does one start? - Author, work, and subjects. Data from publishers (book and journal); libraries, trade organizations and assns.
  40. 40. Issues – IV.a Territorial Rights Publishers carve up markets into territories, geographic and language-based. One publisher might have UK, AU, NZ rights, whilst another might possess U.S. rights. Spanish publishers typically retain worldwide spanish-language rights.
  41. 41. Issues – IV.b Territorial Rights Territorial rights make zero sense for digital editions (n.b. language might). Publishers must obtain non-geographic rights for electronic text versions. (Regional DVD codes is a sad analogy).
  42. 42. Issues – V Search OPDS defines search via OpenSearch. OpenSearch ver status is “under development” and not really “owned” by anyone (origin: A9). Could benefit from support and enhancement.
  43. 43. Issues – VI Faceting On a small screen device, faceting must be a normative discovery user interface form. What is baked in? – Top-20. Classics. New. What is algorithmically derived, on the fly? How can one do this against aggregations?
  44. 44. Issues – VII Bookshelves Users should be able to define and maintain their own book lists in OPDS format. Ideally, these should be portable across book hosting services.
  45. 45. Issues – VIII DRM Bad word, but many publishers still reliant. Best market solution: Adobe ACS4 Pay per transaction model. Desperate need for open source solution. (Perhaps premised on “social-DRM” spec.)
  46. 46. Issues – IX Vending Not a trivial problem. Need an abstracted selling API. Application elicits essential purchaser data, then handles transaction “under the covers” Paypal, Google Checkout, Amazon Checkout
  47. 47. Issues – X Lending Internet Archive would like to lend books (directly, not via a third-party). Is every lending a renting? (no ... !) Is there digital first-sale? (yes ... !) Options: ACS4, streaming (cloud)
  48. 48. Issues – XI Hello World! Currently no way for new OPDS Catalogs to announce themselves to the world. We have discussed a “ping server” to aid the auto-aggregation of Catalogs. This remains a manual notification process.
  49. 49. Join in! OpenPub on Google Code:
  50. 50. Ask the question
  51. 51. thanks! peter brantley internet archive san francisco ca @naypinya (twitter) peter