The third lecture of the course I'm giving on "Interoperability and Semantic Technologies" at Politecnico di Milano in the academic year 2015-16. It presents an introduction to the Semantic Web taking a brief walk through in this 15 years of research, standardisation and industrial uptake.
An Introduction to the Semantic Web and Linked Data
1. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Interoperability and Semantic Technologies 2015-16
An Introduction to the Semantic Web
Emanuele Della Valle
DEIB - Politecnico di Milano
http://emanueledellavalle.org - @manudellavalle
2. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
This work is licensed under the Creative Commons Attribution 3.0
Unported License.
Your are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions
Attribution — You must attribute the work by inserting
“by E. Della Valle – http://emanueledellavalle.org -
@manudellavalle”
at the end of each reused slide
To view a copy of this license, visit
http://creativecommons.org/licenses/by/3.0/
Share, Remix, Reuse — Legally
2
3. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
The Web Today
3
Large number of integrations
- ad hoc
- pair-wise
Too much information
to browse, need for
searching and mashing
up automatically
Each site is “understandable” for us Computers don’t “understand” much
?
Search &
Mash-up
Engine
010
0
1
1
0
0
1101
10100
10 0010
01 101
101 01
110 1
10 1
10 0
1 1 0
1 0 1 0
0 1 1
0 1 1
1 10 0
1 101
0
1
Millions of Applications
4. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
The Problem: “Semantic Gap”
4
Sensor Data
Semantic Gap
Symbolic Description
5. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
“Understanding” Means Bridging the Gap
5
understanding
Sensor Data
Symbolic Description
6. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Do We Really Know What “Understanding” means?
6
[ source http://www.thefarside.com/ ]
7. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Two ways for computer to “understand”
7
Smart Data
Smart Machine
8. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Working examples found on the Web
Image Processing
retrievr: find by sketching
http://labs.systemone.at/retrievr/
Audio Processing
midomi: find by singing
http://www.midomi.com/
[…]
Natural Language Processing
semantic proxy:
http://semanticproxy.opencalais.com/about.html
Introduction
Smart Machines
8
Sensor Data
Symbolic Description
Image
Processing
Audio
Processing
NaturalLanguage
Processing
[…]
9. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Natural Language Processing (NLP)
meets Image Processing (IP)
NLP: What does your eye see?
IP : I see a sea
NLP: You see a “c”?
IP : Yes, what else could it be?
Introduction
Smart Machines alone cannot bridge the gap …
9
[Source NLP Related Entertainment
http://www.cl.cam.ac.uk/Research/NL/amusement.html]
Sensor Data
Symbolic Description
Image
Processing
Natural
Language
Processing
sea “c”
Semantic Gap
10. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Natural Language Processing (NLP)
meets Image Processing (IP)
NLP: What does your eye see?
IP : I see a wordnet:word-sea
NLP: mmm, I see a wordnet:word-c
IP : I believe we have different
understanding of the world …
NLP: So do I
Introduction
… smart data are need
10
Sensor Data
Symbolic Description
Image
Processing
Natural
Language
Processing
sea “c”
smart data
The Semantic Web offers a
set of standards that
lowers the barriers to
employ smart data
at Web scale
11. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
What we say to Web agents
" For more information visit <a
href=“http://www.ex.org”>
my company </a> Web
site. . .”
What they “hear”
" blah blah blah blah blah <a
href=“http://www.ex.org”>
blah blah blah </a> blah blah. .
.”
Jet this is enought to train them
to achive tasks for us
Introduction
What a machine “understands” of the Web
11
[ source http://www.thefarside.com/ ]
12. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Understanding that
[page1] links [page2] page2 is interesting
Google is able to rank results!
“The heart of our software is PageRank™, a system for ranking
web pages […] (that) relies on the uniquely democratic
nature of the web by using its vast link structure as an
indicator of an individual page's value.”
http://www.google.com/technology/
Introduction
What does Google “understand”?
12
13. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
“The Semantic Web is not a separate Web, but an extension of the
current one, in which information is given well-defined meaning,
better enabling computers and people to work in cooperation.”
“The Semantic Web”, Scientific American Magazine, Maggio 2001
http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21
Key concepts
• an extension of the current Web
• in which information is given well-defined meaning
• better enabling computers and people to work in
cooperation.
• Both for computers and people
Introduction
The Semantic Web 1/4
13
14. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
“The Semantic Web is not a separate Web,
but an extension of the current one […] ”
Introduction
The Semantic Web 2/4
14
Web 1.0 The Web Today
15. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
“The Semantic Web […] , in which information is given well-
defined meaning […]”
Introduction
The Semantic Web 3/4
15
Human understandable but
“only” machine-readable
Human and machine
“understandable”
?
Web 1.0 Semantic Web
16. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
The Semantic Web 4/4
16
Semantic Web
META
METAMETA
META
META
META
META
META
META
META
Fewer Integration
- standard
- multi-lateral
[…] better enabling
computers and
people to work in
cooperation.
Even More Applications
Easier to understand for people More “understandable” for computers
Semantic
Mash-ups
&
Search
17. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Linked Data Standards
17
View the full talk at http://www.ted.com/talks/view/id/484 !
18. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Goal: extend the Web with data commons by publishing open data
sets using Semantic Web techs
Introduction
Linking Open Data Project
Visit http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData !
19. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Example: BIO2RDF
19
Peter Ansell, Model and prototype for querying multiple linked scientific datasets, Future
Generation Computer Systems, Volume 27, Issue 3, March 2011, Pages 329-333
20. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
• is an effort to add a spatial dimension to the
Semantic Web.
• uses the information collected by the
OpenStreetMap project
• makes it available as an RDF knowledge base
according to the Linked Data principles.
• interlinks this data with other knowledge bases in the Linking Open
Data initiative.
Introduction
Example: LinkedGeoData
20
21. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
data.gov and data.gov.uk
21
22. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
https://open-data.europa.eu/en/data
23. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
• Who: Richard MacManus
• When: April 15th, 2010
• Context: Modigliani’s painting are
scattered all other the world
• The challenge: If all museums would have
published their collections as linked data,
will it be possible to know the locations of all
the original paintings of Modigliani?
• http://readwrite.com/2010/04/15/the_modigliani_test_semantic_web_tippi
ng_point
Introduction
The Modigliani test for Linked Data
25. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
The Results of Modigliani test for Linked Data
• Who: Atanas Kiryakov (Ontotext AD)
• When: April 25th, 2010
• How: http://factforge.net/ a “reason-able” view to the web of data
• Results: http://bit.ly/ModiglianiTest
http://readwrite.com/2010/04/25/the_modigliani_test_for_linked_data
Introduction
The Modigliani test for Linked Data
26. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
•Since Fall 2009
•450.000 products
•Using RDFa
(= RDF embedded in HTML)
•Pages with RDFa higher in Google
ranking
•BestBuy claims 30% more traffic!
•Yahoo reports 15% higher click-
through rat
Introduction
Example: Best Buy
28. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Google for Nikon+12.3-Megapixel+Digital+SLR+Camera
https://www.google.com/search?q=Nikon+12.3-Megapixel+Digital+SLR+Camera
Introduction
Example: Best Buy
en
Sponsored
Links
29. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Use RDFa with some FB specific vocabulary
og:title - The title of your object, e.g., "The Rock".
og:type - The type of your object, e.g., "movie".
og:image - An image URL
og:url - The permanent ID of your object
og:description - A one to two sentence description of your object.
og:site_name - If your object is part of a larger web site, the name which
should be displayed for the overall site. e.g., "IMDb".
Introduction
Example: Facebook Open Graph
http://ogp.me/
30. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Open Graph Usage Statistics
15 millions sites are using Open Graph! 39% of the top 10,000 sites
Introduction
Example: Facebook Open Graph
[Source: http://trends.builtwith.com/docinfo/Open-Graph-Protocol]
%
40
30
20
2010 2011 2012 2013 2014 2015
31. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Industrial uptake of Semantic markup
2009/2010 winter-2014
147.871.837 urls 620.151.400 urls
microformats 80% rdfa+microdata 80%
[Source: http://webdatacommons.org/structureddata/index.html#results-2013-1 ]
80%
20%
10%
32. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
RDFa, microformats and "rich snippet" on Google trends
Take home message:
data formats matter, but data usage matters even more!
Introduction
Usage matters!
microformats
RDFa
Rich snippet
33. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Usage matters! Yahoo! Search Monkey
[source https://developer.yahoo.com/searchmonkey/siteowner.html ]
34. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Usage matters! Google structured data
[source https://developers.google.com/structured-data/rich-snippets/ ]
35. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
• Schema.org
• an initiative launched on 2 June 2011 by Bing, Google and Yahoo!
• to “create and support a common set of schemas for structured data
markup on web pages.”
• Like microformats it is a collection of vocabularies, but they are organized
in a broad type hierarchy (like RDF-Schema)
• See http://schema.org/docs/full.html
• Initially schema.org introduced yet another type of semantic markup (i.e.,
microdata), but it stepped back and it now recommends either microdata or
RDFa
• Tools are available: http://schema.rdfs.org/tools.html
Introduction
schema.org
enriched pages//35
36. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
There are around 1200 types in the hierarchy
[Sourcehttp://blog.schema.org/2015/11/schemaorg-whats-new.html]
Introduction
schema.org
37. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Beyond rich snippet, into the Knowledge Graph
Introduction
schema.org
38. E. Della Valle – http://emanueledellavalle.org - @manudellavalle
Introduction
Semantic Web “layer cake”
38
Standardized
Under
Investigation
Already
Possible
[ source http://www.w3.org/2007/03/layerCake.png ]