An introduction to the Semantic Web landscape as it stands near the end of 2011. Includes an introduction to the core technologies in the Semantic Web technology stack.
This material was presented at the November, 2011, Cambridge Semantic Web meetup.
39. Dealer 2
Dealer 1 Dealer 3
Employee ERP / Budget
Directory System
Web EPA Fuel Efficiency
Spreadsheet
SPARQL Query Engine
What automobiles get more than 25 miles per gallon and can be purchased at a
dealer located within 10 miles of one of my employees?
SELECT ?automobile
WHERE {
?automobile a ex:Car ; epa:mpg ?mpg ;
ex:dealer ?dealer .
?employee a ex:Employee ; geo:loc ?loc .
?dealer geo:loc ?dealerloc .
FILTER(?mpg > 25 &&
geo:dist(?loc, ?dealerloc) <= 10) .
}
Web dashboard SPARQL query
This initial example shows how a Semantic Web approach differs from both traditional search as well as siloed databases.
Search gives lots of potential hits, but no targeted results. It requires a large investment to manually sort through the hits to see if there are any concrete results within them.
Search within a specific domain is a bit better, but shares the same basic limitations.
This is probably the most common way to find answers – look in specific databases. As soon as a problem spans multiple databases, however, you run into the silo problem: you need to ask part of the question against database A, part against database B, and manually figure out how to combine the results from both to get an answer to the overall question. This is magnified if there are 3, 5, or 20 databases involved in getting an answer.
With a Semantic Web approach, the siloes are broken down, the data is linked together, and a single query can result in a targeted list of specific results.
One of the goals of this tutorial is to de-mystify the all of the names of technologies, tools, projects, etc. that swirl around the Semantic Web story.And since I saw that as I researched this presentation, everyone seems to like this particular Gary Larson cartoon, it behooved me to include it.
Different nuances, but the same actual thing. Still, you can often tell a lot about someone’s view of Semantic Web based on the terms they choose to you to describe it. Linked Data Web has been – relatively speaking – successful in gaining traction.
This is the ultimate vision as per the original Scientific American article. Referred to last week as the “top-down approach”.
Many of the people that have been building the technologies, standards, and tools are doing so with these ends in mind. They have (disruptive, game-changing) problems today and these technologies provide a way to solve them today.
http://www.flickr.com/photos/reverendsam/2367569306/The good – emphasize the importance of the foundational layers (URIs and RDF) ; emphasizes the long-term roadmap/vision of what’s needed for the Semantic WebThe bad – implies that perhaps things can’t be taken serious until all the pieces are in place ; implies an order to the research ; various versions of the cake tell different stories (importance of XML, absence of query, lack of UI/application layer, …)Valentin Zacharias wrote about the “infamy” part of the layer cake here: http://www.valentinzacharias.de/blog/2007/04/ban-semantic-web-layer-cake.html
http://www.flickr.com/photos/reverendsam/2367569306/The good – emphasize the importance of the foundational layers (URIs and RDF) ; emphasizes the long-term roadmap/vision of what’s needed for the Semantic WebThe bad – implies that perhaps things can’t be taken serious until all the pieces are in place ; implies an order to the research ; various versions of the cake tell different stories (importance of XML, absence of query, lack of UI/application layer, …)Valentin Zacharias wrote about the “infamy” part of the layer cake here: http://www.valentinzacharias.de/blog/2007/04/ban-semantic-web-layer-cake.html
What’s been happening this whole time? (Between the introduction of the vision and today.) A lot of technology, standards, tool, and product development. Also, a lot of advocacy.
The Ontology/ontology dichotomy is captured well by Jim Hendler at http://www.cs.rpi.edu/%7Ehendler/presentations/SemTech2008-2Towers.pdf
Of course, these are all unsolved problems in the relational world as well, but they may be magnified with the highly distributed nature of the Semantic Web.
Definition.
Prescriptive.
Descriptive.
Formal.
The first is as opposed to relational tables or XML schemas where the schema needs to be explicitly adjusted to accommodate whatever data is being merged.The second is due to the expressivity of the model – can handle lists, trees, n-ary relations, etc.The third is as opposed to table & column identifiers or XML attribute names.
(This slide best told with animation in the original PowerPoint.)The Semantic Web paradigms allows new and updated data to be brought “into the fold” incrementally, without starting over. This makes it particularly amenable to changing requirements.
Main message: With relational technology, you lose the meaning of the data as soon as it exits the database – and so you end up hard-coding the knowledge of what the data means throughout every part of a software application. This is very error-prone and means it’s extremely tough to change anything.
Main message: With semantics, the data always travels with its meaning, and the data looks the same inside and outside the database.
Definition.
Prescriptive.
Descriptive.
Descriptive (part 2). This is leagues ahead of the situation with SQL! Major deployment help on the Web.
Definition.
Definition.
We call this “semantic data virtualization”.Databases that traditionally manage enterprise data are IT artifacts.They’re crafted by IT, for IT: asking scientists or other business domain experts to understand a relational model with scores of tables, IDs, key/value tables, unused columns, etc. is completely unrealistic.The semantic model is a conceptual model. It eschews IDs, keys, etc. in favor of concepts and relationships expressed/expressible in human language. This is reflected in software that is built with Semantic Web data. This means that when a researcher is linking their results spreadsheet, they’re dealing only in concepts that they’re familiar with (trades, accounts, settlements, securities, etc.). And that in turn means that this approach works regardless of whatever spreadsheet layout a particular collaborator is using: researchers can continue using their current spreadsheets, with no change.
Also see examples later in this deck – Semantic Web In Use – would be nice to have an example of RDFa markup
Possible answers: Few people are driven by data ownership, data portabilityPeople are drawn to specific sitesPeople _want_ to segment their online profiles (c.f. Facebook vs. LinkedIn)Drupal—which runs 1% of the world’s Web sites—is on the leading edge of adoption of the Semantic Web for content-driven sites. Drupal 7 exposes the semantics of Drupal sites’ natural structures to Google/Yahoo! with RDFa. Also modules for SIOC and Facebook OGP.
The key point here is that though FB published this protocol, it relies on open Semantic Web standards (RDFa) that anyone else can consume. The same semantics allow people to link the “Like” button to the type of artifact being liked (movie, here) and also can allow search engines to give more structure, query engines to find more data, etc.
Image courtesy of http://bio2rdf.org/ .Scientific data makes up a significant portion of the current Linked Data Web. This is information on proteins and genes, pathways, and sequences, chemistry and genetics, … This diagram shows some of the information available and how its linked together. Nodes are sized according to their quantity of data, and links are sized according to the quantity of links.
Google (Rich Snippets) and Yahoo! (originally Search Monkey) consume semantic markup to enhance search listings.
Many enterprise uses of Semantic Web / Linked Data are highlighted at: http://www.w3.org/2001/sw/sweo/public/UseCases/
Question: Where in this scenario do you think Semantic Web concepts and technologies are being employed? What would the alternative be?Answers: integrating data to get as large a universe as possible; rules and reasoning to intelligently filter the data
Combine manual tagging with ontology-driven reasoning and ontology-driven dynamic aggregation (700 index pages, more than the rest of the sports site combined) to produce a dynamic, cross-indexed, cross-linked, useful site for the World Cup.What is the semantic value here? * Produce an information rich site at many levels of aggregation (player, team, geography, group, …) without employing a large fleet of editors to curate the site’s _content_. Instead, maintain an ontology and provide a content tagging process. * Use the ontology to help automate the tagging process (forward-chaining inference based on taxonomies)For more details:http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dynamic_sem.html http://www.bbc.co.uk/blogs/bbcinternet/2010/07/the_world_cup_and_a_call_to_ac.html
Other governments with similar efforts. Australia, Sweden,New Zealand, … , various local governments