Hi, I’m Julia Bauder, Data Services Librarian at Grinnell College, and today I’d like to give you a brief introduction to the Semantic Web.
The VERY brief explanation of the Semantic Web is that it’s a Web where information is formatted to be read and processed by machines, unlike the current Web, where most of the formatting is about making information look pretty for people. That might not sound like a big deal, but if the Semantic Web ever really gets going it could be a huge time-saver for people who are looking for information. (pause) I’ll get back to the part about how it could save you time in a minute, but first, I want to talk a little bit about how it works.
So, librarians already understand the value of “uniform identifiers” – one agreed-upon label, where everybody knows what we’re talking about when we use it. ISBNs and ISSNs are good examples, as are the headings in the LC Authority Files. Using ISBNs, we can be sure that a book we find on Amazon is the same as a book we have in our local catalog; using the LC authorized form of a name, we can be sure that we’re bringing together all of the information about an author or a work under the same heading.
Uniform identifiers somewhatlike these, called URIs, are part of what make the Semantic Web work: they allow everyone on the Web—both machines and people—to be clear about what is meant when a term is used. Unlike in the library world, anybody can make up their own set of uniform identifiers if they want. Some library authorities have created uniform identifiers for their data – the Library of Congress is in the process of creating URIs for the terms in their authority files; there are official URIs for the Dublin Core metadata terms – but then you get things like a professor at the Free University of Berlin writing a program to create URIs automatically for every book on Amazon.
URIs by themselves are helpful, but where the Semantic Web really makes a contribution is in allowing you to combine URIs to assert something. For example, let’s say that we want to say that J. R. R. Tolkein is the creator of The Lord of the Rings. There are universal, machine-processable terms for each of these three concepts, and RDF – Resource Description Framework, one of the languages used on the Semantic Web – will let us glue them together in a meaningful way.
This is what that assertion might look like in RDF. If you’re familiar with XML, you can probably tell what’s going on here. If you’re not familiar with XML, you can still basically see what’s going on – we’re using RDF to describe the thing identified by the first URI – the URI that identifies the 50th anniversary edition of the Lord of the Rings – and we’re saying that it has the creator identified by the second URI – Tolkien. If you look at the Dublin Core site, it will give you more examples of different things you can say about works using Dublin Core in RDF.
In addition to telling computers specific things, like “Tolkien is the author of Lord of the Rings,” it can also be helpful to express stuff that everybody knows about the world in terms that allow computers to “know” it, too. For example, we know that novelists are a subclass of writers, but if you ask a dumb, excessively literal computer for information about people who are writers, and some of those people are labeled “novelists” rather than “writers,” it doesn’t work out so well. And it would be a pain to lay out all of that information explicitly for every single person – Tolkien is a novelist, Tolkien is a writer, Tolkien is a person, Tolkien is an intelligent agent, Tolkien could do all of the things intelligent agents can do…. So having all of that common-knowledge type information about classes of things laid out somewhere that computers can draw on it makes life a little easier – you don’t have to be quite so explicit when you tell a computer to find something, and the computers won’t make dumb-for-a-human mistakes quite so often.
Cyc is a project that’s been around for awhile, trying to develop an ontology of everything in the world for different kinds of computer applications, and they’ve made their ontology openly available in Semantic Web format.I’ve edited this a little bit for length, so it would fit on one slide, but this is an example of the sort of stuff you can say about classes of things and how you can say it. We’ve got two more languages going on here – RDFS, which stands for RDF Schema, and OWL, which stands for Web Ontology Language. Same basic concept as RDF, but they’re especially designed to let us create structured vocabularies and ontologies – they let us talk about classes and subclasses and the relationships between classes.Unfortunately Opencyc uses these non-human-friendly identifiers, but you can see that we’re talking about hobbits – because they kindly gave us an English-language label for that one, at least. So we can say that hobbits are subclasses of these other two things – if you look at the human-readable version, it shows that those things are “fictional being” and “intelligent agent”. And we can saythat hobbits are a type of organism—that fourth gobbledygook identifier is for “organism”--and we can also say that the thing Opencyc refers to when it talks about hobbits are the same as “hobbits” in umbel and dbpedia, two other sites with Semantic Web content. If you want, you can look at the human-readable version of this next to this slide, and it will be clearer.
The Semantic Web isn’t quite here yet, but someday soon you’ll be able to ask a question like this and have a computer generate this list for you, even if nobody has yet thought to create a page with this information. The data that you would need tocreate this list is already available on the Semantic Web, although a user-friendly, natural-language Semantic Web browser that can handle a query this complex doesn’t exist yet, as far as I know.
But if we had such a Semantic Web browser, it could start at dbpedia—a project to take information from Wikipedia and make it available in Semantic Web formats.Dbpedia says that Tolkien and some other folks are all part of a group “fellows of Pembroke College, Oxford.” And then dbpedia.org says that its Tolkien is the same person as Tolkien in Freebase, a site where users can submit structured information that’s then made available in Semantic Web formats.And someone has submitted to Freebase that Tolkien won this award. So the computer would add that award to the list and then go back out and try this again—see if it can find more awards for Tolkien, see if it can find awards won by other fellows of Pembroke College.Since this is all written in a machine-readable format on the Semantic Web a computer could go the whole way from Dbpedia’s list of fellows of Pembroke College to the awards won by each of them without any human intervention. The Semantic Web isn’t quite there yet, but the day is coming soon. And there are already some useful things that you can do with the linked data currently available, even without full Semantic Web capabilities.
If you’ve worked with XML at all, this should look fairly familiar. You see all of the URIs they’re using are Freebase URIs, but they wouldn’t have to be.
There are a few Semantic Web browsers available – Disco is one, I have the link to that one here, and if you’re interested in more click on any of the links in this presentation that go to Dbpedia and look at the bottom of the page, it has a row of links there to some other Semantic Web browsers.The important thing to remember about Semantic Web browsers is that right now they only work on URIs, not on natural language. But if you go to Disco and search for the dbpedia URI for Tolkien, it will pull in information about Tolkien from different places around the Semantic Web and display it there in a structured format. It’s not necessarily the prettiest or most useful thing quite yet, but it’s definitely a technology to watch as it continues to improve and as more data becomes available in Semantic Web formats.
The Semantic Web Julia Bauder, Grinnell College Libraries Social Software Showcase 2009
What is the Semantic Web? A Web where information is formatted to be read and processed by machines, not people. Why should you care? It will save you time!
ISBNs 978-0618645619 = Lord of the Rings 50th Anniversary one-volume edition, hardcover ISSNs 1547-3155 = Tolkien Studies: An Annual Scholarly Review, print version Library of Congress Authorized Headings Tolkien, J. R. R. (John Ronald Reuel), 1892-1973 Uniform Identifiers in Libraries
http://www4.wiwiss.fu-berlin.de/bookmashup/doc/books/0618645616 = Lord of the Rings 50th Anniversary one-volume edition, hardcover
Uniform Identifiers on the Web
Using an XML-based language called RDF, combine URIs to assert something: J. R. R. Tolkien is the creator of The Lord of the Rings J. R. R. Tolkien (http://dbpedia.org/resource/J._R._R._Tolkien) is the creator of (http://purl.org/dc/elements/1.1/creator) The Lord of the Rings 50th anniversary edition, hardcover (http://www4.wiwiss.fu-berlin.de/bookmashup/doc/books/0618645616) Make a Statement with URIs
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dcterms="http://purl.org/dc/terms/"> <rdf:Description rdf:about="http://www4.wiwiss.fu-berlin.de/bookmashup/doc/books/0618645616 "> <dcterms:creator rdf:resource="http://dbpedia.org/resource/J._R._R._Tolkien"/> </rdf:Description></rdf:RDF> Syntax from http://dublincore.org/documents/dc-rdf/ In RDF
A novelist is a type of writer. A writer is a type of person. A person is a type of intelligent agent. (A hobbit is a type of intelligent agent, too!) An intelligent agent can know about things. (Taken from OpenCyc, http://www.opencyc.org) Add Some Information about Classes of Things
<owl:Classrdf:about="Mx4rvdRsSZwpEbGdrcN5Y29ycA"> <rdfs:labelxml:lang="en">hobbit</rdfs:label> <rdfs:subClassOfrdf:resource="Mx4rvVinb5wpEbGdrcN5Y29ycA"/> <rdfs:subClassOfrdf:resource="Mx4rwQwwBZwpEbGdrcN5Y29ycA"/> <rdf:typerdf:resource="Mx4rvVjf5JwpEbGdrcN5Y29ycA"/> <owl:sameAsrdf:resource="http://umbel.org/umbel/sc/Hobbit"/> <owl:sameAsrdf:resource="http://dbpedia.org/resource/Hobbit"/> </owl:Class> Human-readable version: http://sw.opencyc.org/concept/Mx4rvdRsSZwpEbGdrcN5Y29ycA What it looks like inside Opencyc
(Someday Soon)Let the Semantic Web Do the Work “Computer, I want a list of awards won by fellows of Pembroke College, Oxford, for books they have written!”
<fb:award.award_winner.awards_won> − <fb:award.award_honor rdf:about="http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000073950b9"> <fb:award.award_honor.year>1957</fb:award.award_honor.year> <fb:award.award_honor.award_winner rdf:resource="http://rdf.freebase.com/ns/en.j_r_r_tolkien"/> <fb:award.award_honor.award rdf:resource="http://rdf.freebase.com/ns/en.international_fantasy_award_for_fiction"/> <fb:award.award_honor.honored_for rdf:resource="http://rdf.freebase.com/ns/en.the_lord_of_the_rings"/> </fb:award.award_honor> </fb:award.award_winner.awards_won> − What it looks like inside Freebase
There are a few Semantic Web browsers available, including Disco (http://www4.wiwiss.fu-berlin.de/rdf_browser/) To browse the Semantic Web, start by entering a URI, not text. Try the URI for Tolkien: http://dbpedia.org/resource/J._R._R._Tolkien Browse the Semantic Web Today!