• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Open Library at the API Workshop
 

Open Library at the API Workshop

on

  • 3,302 views

Presented February 26, 2011 at The Maryland Institute for Technology in the Humanities.

Presented February 26, 2011 at The Maryland Institute for Technology in the Humanities.

Statistics

Views

Total Views
3,302
Views on SlideShare
2,256
Embed Views
1,046

Actions

Likes
3
Downloads
12
Comments
2

7 Embeds 1,046

http://blog.openlibrary.org 889
http://www.slideshare.net 150
http://translate.googleusercontent.com 2
http://blog.openlibrary.org.ezproxy.plsinfo.org:2048 2
url_unknown 1
http://webcache.googleusercontent.com 1
http://blog.openlibrary.org.ezproxy.bpl.org 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Open Library at the API Workshop Open Library at the API Workshop Document Transcript

    • Hello. MITH API Workshop George Oates Maryland, February 2011Monday, April 11, 2011
    • Some rights reserved by mattdorkMonday, April 11, 2011I work at the Internet Archive, leading The Open Library project. We recently moved in to thischurch in The Richmond in San Francisco. We’re turning it into a library.
    • Monday, April 11, 2011We’re based in San Francisco, California, where I happen to have been living for about 5years.
    • Universal Access to All KnowledgeMonday, April 11, 2011Since 1996, the non-profit Internet Archive has been building a digital library of Internet sitesand other things in digital form. archive.org has a ton of texts, video, software, live music...all sorts of things.Our mission is Universal Access to all Knowledge. Not a bad reason to get out of bed eachday...
    • Some rights reserved by heatherMonday, April 11, 2011It’s not your traditional non-profit... Lots of the staff are technologists and developers.
    • archive.orgMonday, April 11, 2011We have many computers. They store over- 100,000 hours of TV from channels all over the world- 250,000 moving images or video- 500,000 audio recordings- 2.5 million scanned texts- 150,000,000,000 web pages
    • By rkumarMonday, April 11, 2011Just the other day we had 2.88 petabytes of hard drives delivered. That’s enough storage forabout 2 billion books.
    • Monday, April 11, 2011Another major part of what we do is scanning books. This is a picture of one of the scanningcenters in San Francisco. We currently employ about 200 staff scanning books
    • Monday, April 11, 2011And today, we have over million free texts available online ‐ that includes over 1 million books150 million pages scanned1,000 books scanned EVERY day24 scanning centers in 5 countries, and we hope for more.
    • Monday, April 11, 2011We’re also scanning microfilm, which is much faster than individual books. Here’s an example of the record of the populaJon census from 1790 to 1930. Scanned from microfilm from the collecJons of the Allen County Public Library and originally from the United States NaJonal Archives Record AdministraJon.
    • Monday, April 11, 2011Examples of Cross Writing from Boston Public Library
    • Monday, April 11, 2011Over 1 million free books that you can read on archive.org today, and access through theOpen Library site, by checking the little “Only eBooks” box as you search.
    • Monday, April 11, 2011As well as being able to download these books in a variety of different formats, from PDF toTXT and more, we also have a web-based book reader, which you can use to read ourscanned texts within your web browser, without the need for any additional software. At theend of 2010, we released a new version of our open source, browser-based BookReader.I’ve actually come to Wellington direct from a meeting in San Francisco called Books inBrowser, held at the Internet Archive last week. It was there that we announced an upcomingnew release of our bookreader, which will hopefully go live in the next few weeks... Here aresome screenshots...
    • Monday, April 11, 2011The main reason we wanted to improve on the current design was to try to build an “app-level quality” book reading experience right in the browser. This included severalimprovement for touch interfaces in browsers on devices like the iPad.From a straightforward design perspective, there were also improvements to be made onusability and simple stuff like making the book bigger in the browser window.
    • Monday, April 11, 2011This is a screenshot with the toolbar open, where you can see new features like a navigationbar at the bottom that allows you to scroll through the book, a “read to me” feature whichplays the book in a computer-y voice, and highlights what’s being read. Also, if we know atable of contents for the book, each chapter is mapped along the navigation bar.We’ve also rewritten the full text search engine, and I’ll talk more about that a bit later.
    • By rkumarMonday, April 11, 2011Apologies for the slightly blurry picture, but this is my boss, Brewster Kahle, who founded theInternet Archive back in 1996. He’s playing with a touchscreen which is displaying the newbookreader. The screen’s been installed in one of the reading desks that used to sit in thereading room of the Christian Science church before it became our new home. A big part ofthe bookreader redesign was to evolve an app-level quality book reading experience within aweb browser. If you have an iPad, I’d encourage you to try it!
    • Monday, April 11, 2011The Open Library project was launched back in 2007. In May 2010, we launched a total siteredesign. Just last week, we released a revised home page, building on our new Lendingprogram, and generally trying to do a better job of communicating that you can come toOpen Library to find something to read for free, or a book to borrow. We also added activitygraphs to try to show that there’s stuff happening, all day, every day.
    • A “Wikipedia for Books”Monday, April 11, 2011There are a few different ways to describe what Open Library is, but I think the explanationthat makes the most sense is “a Wikipedia for Books”.
    • Monday, April 11, 2011Scrolling down the home page...
    • Monday, April 11, 2011We have a lending library of some 10,000 20th Century books. You can also access another80,000 books if you’re (literally) sitting in one of the 150 or so libraries participating in our“In-Library Lending” program. Each participating library contributes eBooks into the in-librarypool, and you can borrow anything in the pool, once you’re sitting in one of the libraries.
    • Monday, April 11, 2011Yay! Graphs going up! (That peak you can see across the graphs is our lending launch. Formore info, read “Get Thee to a Library!” http://blog.openlibrary.org/2011/02/22/get-thee-to-a-library/)
    • Monday, April 11, 2011Snapshot of the various combinations of links we can provide to get you to books... For bookswe can’t lend through our own lending program, we’ve connected to Overdrive... We’rehoping to make the vendors you can buy from more dynamic, and open up the sources foronline free texts. Right now, it’s just the Internet Archive texts that we link to in full.
    • lending ebooks • map / openstreenMonday, April 11, 2011You can browse a map of (mainly North American) libraries participating in the In-Librarylending program. If you’re interested to join in, please contact us!
    • borrow page • screenMonday, April 11, 2011Here’s what a page looks like to borrow a book. You can see 3 options: In Browser, PDF, andePub.In-browser is available immediately. You need to download/install Adobe Digital Editions toread PDF or ePub versions.
    • Developer ResourcesMonday, April 11, 2011
    • Open Library http://openlibrary.org/developersMonday, April 11, 2011Python, Postgres, SOLR, JSON, REST
    • http://github.com/openlibraryMonday, April 11, 2011We certainly have our code online at github, but we rarely receive patches. I’m OK with this,at least for now.
    • JSON/RDF http://openlibrary.org/developersMonday, April 11, 2011
    • Monday, April 11, 2011
    • Monday, April 11, 2011JSON blob
    • Monday, April 11, 2011JSON blob
    • Monday, April 11, 2011
    • • http://openlibrary.org/works/OL69181W/ • http://openlibrary.org/works/OL69181W.json • http://openlibrary.org/works/OL69181W.rdfMonday, April 11, 2011HTML, JSON, RDF
    • Data Dumps http://archive.org/details/ol_dataMonday, April 11, 2011
    • archive.org/details/ol_dataMonday, April 11, 2011There’s a copy of everything we’re using on the Internet Archive too.
    • API http://openlibrary.org/developers/apiMonday, April 11, 2011Open Library has a RESTful API, best used to link into Open Library data in JSON,YAML and RDF/XML.
    • API http://openlibrary.org/developers/api Books Covers Search inside Subjects Recent Changes ListsMonday, April 11, 2011Open Library has a RESTful API, best used to link into Open Library data in JSON,YAML and RDF/XML.
    • Request: Request: http://openlibrary.org/dev/docs/api/listsMonday, April 11, 2011
    • Monday, April 11, 2011
    • Monday, April 11, 2011
    • Monday, April 11, 2011We built lists for a couple of reasons: 1, to help people collect things together, and 2, tomake it easy to get at smaller sets of records.
    • Covers http://openlibrary.org/developers/apiMonday, April 11, 2011
    • Monday, April 11, 2011Where: • key can be any one of ISBN, OLCC, LCCN, OLID and ID (case-insensitive) • value is the value of the chosen key • size can be one of S, M and L for small, medium and large respectively.
    • (we use this)Monday, April 11, 2011Where: • key can be any one of ISBN, OLCC, LCCN, OLID and ID (case-insensitive) • value is the value of the chosen key • size can be one of S, M and L for small, medium and large respectively.
    • Monday, April 11, 2011
    • Monday, April 11, 2011
    • Yay!Monday, April 11, 2011
    • Monday, April 11, 2011
    • DOUBLE Yay!Monday, April 11, 2011
    • Monday, April 11, 2011One of quite a few examples of Open Library in the wild includes the National Library ofAustralia’s new search engine, Trove.
    • Monday, April 11, 2011You can see there that there are links to Open Library books wherever one can be sourced.There are a growing number of sites making use of Open Library data... and that’s what we’reall about - data in, data out. The more interconnections we can make with other systems, theeasier it will be for people to land where they want to go inside Open Library.
    • Monday, April 11, 2011This is ImportBot. He gets new catalog records from the Library of Congress and puts theminto Open Library every Tuesday. We also import records from Amazon, and from the InternetArchive. ImportBot looks for recently scanned books, and creates new records (or mergesthem with existing ones) just a few minutes after the record is created on the InternetArchive.
    • Monday, April 11, 2011You can see ImportBot working away, just like you can see the Wiki’s edit history for everyperson who edits something.
    • Monday, April 11, 2011Another quick note on data in before I move on...We’ve been experimenting with a couple of other “surgical” bots, that look across the catalogand connect edition records directly to other services by stamping identifiers from othersystems into Open Library. This is a bot written by a developer called Ben Gimpert, that takesa file mapping ISBN to Goodreads IDs, and looks for ISBN matches in OL, then adding theGoodreads ID to those records. This allows us to construct links to Goodreads, and to makethe Goodreads ID available through the API.
    • Monday, April 11, 2011You can see we’ve added a little widget on the page that connects to Goodreads, if you havean account, you can add our records to your lists on Goodreads. There’s also a LibraryThingID too, added by a similar batch bot update.Writing bots to do things like this is the sort of development we’d like to open up to externaldevelopers too...
    • BookReader http://openlibrary.org/dev/docs/iaMonday, April 11, 2011
    • Monday, April 11, 2011This is a screenshot with the toolbar open, where you can see new features like a navigationbar at the bottom that allows you to scroll through the book, a “read to me” feature whichplays the book in a computer-y voice, and highlights what’s being read. Also, if we know atable of contents for the book, each chapter is mapped along the navigation bar.We’ve also rewritten the full text search engine, and I’ll talk more about that a bit later.
    • Monday, April 11, 2011The Library of Congress is using our Bookreader on read.gov. There are quite a few otherexamples of the IA Bookreader out there on the web. Hopefully the redesign (with touchinteractions etc) will attract new people too...
    • Monday, April 11, 2011Princeton Digital Library
    • Internet Archive http://openlibrary.org/dev/docs/iaMonday, April 11, 2011
    • http://archive.org/helpMonday, April 11, 2011
    • Raw Full Text > 4 million documents with metadataMonday, April 11, 2011
    • Stanford NLP thing http://nlp.stanford.edu/Monday, April 11, 2011We’ve just begun experimenting with some of the software made by the the Stanford NaturalLanguage Processing Group - that includes members of both the Linguistics Department andthe Computer Science Department, One idea is to fold this software into the scanningprocess, so we can do a first pass on entity extraction on full text of a book, to extract thingslike names, places and common subjects...
    • Monday, April 11, 2011But then of course, you can do cool stuff like this :)
    • ChallengesMonday, April 11, 2011
    • http://flic.kr/p/6zyU3U Tension?Monday, April 11, 2011The Taxonomy vs Folksonomy debate may be represented thusly.
    • 1) Books are for use. 2) Every reader his [or her] book. 3) Every book its reader. 4) Save the time of the User. 5) The library is a growing organism.Monday, April 11, 2011So, on the basis of the idea of our current catalog being a substrate, as Ranganathansuggests in his five laws of library science...
    • 1) Books are for use. 2) Every reader his [or her] book. 3) Every book its reader. 4) Save the time of the User. 5) The library is a growing organism.Monday, April 11, 2011So, on the basis of the idea of our current catalog being a substrate, as Ranganathansuggests in his five laws of library science...
    • Monday, April 11, 2011So... Open Library is a virtual space. Its organization isn’t constrained like a physical catalog.In fact, the more connections you can make into one of our “virtual index cards” the moreways people have to discover and navigate its contents.http://www.flickr.com/photos/brixton/1394845916/
    • http://flic.kr/p/6pmtQLMonday, April 11, 2011But, librarians are (very clever) humans too. And everyone who’s responsible for puttingbooks into a traditional catalogue must work within patterns. Patterns that have grownsemantically remarkable and deeply complex.
    • Unknown author 403 Unknown Author 358 Author unknown 254 No Author 145 Author Unknown 59 No Author. 54 Author 20 No author. 16 No author 12 unknown author 8 Unknown Author Unknown 7 no author 7 No Author Stated 7 (No Author) 6 No author noted 5 http://openlibrary.org/search No author noted. 4 no author listed 4 ?author=author (no author) 4 Author Not Stated 4 Author. 4 No author specified 3 Miscellaneous Author 3 no Author 3 Author One 3 Multi-Author 3 No Author Listed 3 No Stated Author 3 Author Anonymous 2 (no author given) 2 Author 2 Author Wright 2 Unkown Author 2 No author stated 2 Mms suspense author 2 Author Test 2 TEST AUTHOR 2Monday, April 11, 2011Duplicate authors (and editions) are an issue... This is an example search for author recordswith “author” in their names... you can see the variety of ways that catalogers have notedunknown authors...
    • http://www.flickr.com/photos/blackbeltjones/4294354526/Monday, April 11, 2011We’ve noticed a TON of minor variations in the way cataloguers enter data... Trivial to us, butvery hard for computers to differentiate
    • Substrate: any surface on which a plant or animal lives or on which a material sticks Some rights reserved by Brynja EldonMonday, April 11, 2011We have a repository that mostly contains records created by professionals. I find it useful toconsider these records as a substrate, something that can be reacted upon.
    • What if we consider the source Open Library records like that? Some rights reserved by Brynja EldonMonday, April 11, 2011Now that we’ve begun to reveal this substrate, how will people react to it? What reactions hasit caused so far?
    • Monday, April 11, 2011Handwritten scribbles and scrawls; annotations; corrections
    • Some rights reserved by jaredMonday, April 11, 2011What if a catalog looks like this? Is crystalline? What if it is unconstrained by the need to sort,say, alphabetically?From the artist of this image, Jared Tarbell: “Lines like crystals form at perpendicular anglesto existing lines. A complex form emerges.1000 classic computational substrate, color palette stolen from Jackson Pollock: A simpleperpendicular growth rule creates intricate city-like structures. The simple rule, the complexresults, the enormous potential for modification; this has got to be one of my all time favoriteself-discovered algorithms. Lines likes crystals grow on a computational substrate.”
    • Monday, April 11, 2011What happens when you introduce turbulence into the catalog? Here are a few examples ofthe sorts of edits we’re seeing... at a rate of about 100,000 edits per month.http://www.flickr.com/photos/rreis/4859722551/sizes/l/
    • 000s of edits per monthMonday, April 11, 2011What happens when you introduce turbulence into the catalog? Here are a few examples ofthe sorts of edits we’re seeing... at a rate of about 100,000 edits per month.if you don’t stimulate an organism, it atrophieshttp://www.flickr.com/photos/rreis/4859722551/sizes/l/
    • Activity/HistoryMonday, April 11, 2011One of the key components to any happy social system is the visibility of other people, and asense of activity. This is one of the key elements we’re focussed on in the redesign. Thisparticular list shows all edits by humans on Open Library, and actually, turns out to be ahandy way to spot check what’s happening. You’ll notice too, there’s a special tab for thevariety of edits that we run across the system using bots. Often pretty mechanical andrepetitive, we found that the bots obscure the humans if you just mush everything up in a biglist, so we separated them.
    • Activity/History Live DataMonday, April 11, 2011One of the key components to any happy social system is the visibility of other people, and asense of activity. This is one of the key elements we’re focussed on in the redesign. Thisparticular list shows all edits by humans on Open Library, and actually, turns out to be ahandy way to spot check what’s happening. You’ll notice too, there’s a special tab for thevariety of edits that we run across the system using bots. Often pretty mechanical andrepetitive, we found that the bots obscure the humans if you just mush everything up in a biglist, so we separated them.
    • Solutions?Monday, April 11, 2011
    • Shelf http://www.flickr.com/photos/emdot/400280705/Monday, April 11, 2011I really like how Raymond described his book yesterday, that as soon as he’d written it, itbegan to decay... Concrete, decay
    • Network http://www.flickr.com/photos/arenamontanus/352130655/Monday, April 11, 2011Plastic, self-healing
    • Minimum Viable RecordMonday, April 11, 2011Now, I want to try a little exercise. I’m going to hand out an index card to all of you, and askyou to nominate 5 fields that you think is enough to describe a book. I’ll collate the resultsand report back later.
    • Monday, April 11, 2011Stamen Design in SF. Got funding from Knight Foundation to build Citytracking. Challenge is a “hodgepodge ofbits—including APIs [2] and official sources, scraped websites, sometimes-reusable data formats and datasets,visualizations, embeddable widgets etc.—is fractured, overly technical and obscure, held in the knowledge base ofa relatively small number of people, and requires considerable expertise to harness.”
    • Monday, April 11, 2011Stamen Design in SF. Got funding from Knight Foundation to build Citytracking. Challenge is a “hodgepodge ofbits—including APIs [2] and official sources, scraped websites, sometimes-reusable data formats and datasets,visualizations, embeddable widgets etc.—is fractured, overly technical and obscure, held in the knowledge base ofa relatively small number of people, and requires considerable expertise to harness.”
    • Monday, April 11, 2011
    • Online Publishing Distribution System (OPDS) http://bookserver.archive.org/catalog/newMonday, April 11, 2011This is an example of trying something very bare bones, to try to help systemsintercommunicate more easily. (Open Library plans to publish OPDS feeds soon.)Online Publishing Distribution System (OPDS): The Open Publication DistributionSystem (OPDS) Catalog specification is a syndication format for electronic publicationsbased on Atom RFC4287 and HTTP RFC2616.
    • American notes for general circulation [microform] February 25, 2011 10:22 AM Author: Dickens, Charles, 1812-1870 Publisher: New York : Harper Year published: 1842 Book contributor: Canadiana.org Language: en Download Ebook: (PDF) (EPUB)Monday, April 11, 2011
    • Monday, April 11, 2011Individuals can also add new books with a few details like Title, Author, Publisher and PublishDate. That’s enough for a stub, and then people are invited to add more details.
    • Canonical ID?Monday, April 11, 2011
    • Canonical ID? Collect them.Monday, April 11, 2011
    • Monday, April 11, 2011Another experiment we’re looking forward to trying is about identifiers. We’re not particularlyconcerned about canonical identifiers. Perhaps it’s a waste of time to wait for one, so instead,we’re going to try and attach as many ID types to our records as we can. (This list is just abraindump - not active yet.) The idea is that people could add a URL or actual identifier andOpen Library would just do the right thing. A suggestion (after this presentation wasdelivered) was that people could ping Open Library with an identifier, not even knowing whatTYPE of ID it is. Perhaps Open Library could help “triangulate” this query towards a bookrecord. “Record laundering.”
    • Canonical ID? Exchange them.Monday, April 11, 2011
    • http://openlibrary.org/books/olid/OL7440033M http://openlibrary.org/books/isbn/0385472579 http://openlibrary.org/books/isbn/9780385472579 http://openlibrary.org/books/lccn/93005405 http://openlibrary.org/books/oclc/28419896 http://openlibrary.org/books/id/240727 http://openlibrary.org/books/amazon/... http://openlibrary.org/books/bookmooch/... http://openlibrary.org/books/goodreads/... http://openlibrary.org/books/ocaid/... http://openlibrary.org/books/librarything/... http://openlibrary.org/books/paperback_swap/... http://openlibrary.org/books/Your ID Here/...Monday, April 11, 2011You can already ping Open Library with an ID other than the Open Library identifier to see ifwe have any matches.
    • http://openlibrary.org/books/olid/OL7440033M http://openlibrary.org/books/isbn/0385472579 http://openlibrary.org/books/isbn/9780385472579 http://openlibrary.org/books/lccn/93005405 http://openlibrary.org/books/oclc/28419896 http://openlibrary.org/books/id/240727 http://openlibrary.org/books/amazon/... http://openlibrary.org/books/bookmooch/... http://openlibrary.org/books/goodreads/... http://openlibrary.org/books/librarything/... http://openlibrary.org/books/ocaid/... http://openlibrary.org/books/paperback_swap/... http://openlibrary.org/books/Your ID Here/...Monday, April 11, 2011
    • Your IDMonday, April 11, 2011
    • Your ID Everyone else’sMonday, April 11, 2011
    • Make nodes, not cardsMonday, April 11, 2011Some rights reserved byyobink
    • Network, not sequenceMonday, April 11, 2011
    • Thanks! George Oates glo@archive.org @openlibraryMonday, April 11, 2011