1. Semantic web and Graphical
Representation of Information
Chao-Hsuan Shen!
!
Winter 2014!
!!!!!!!!!!!!!!!!!!!
Audience!
!! To begin, I want you ask yourself one important question: What’s the point of all that connections
and data, anyway?
1
2. Table of Contents!
Audience! 1!
Table of Contents! 2!
Introduction! 3!
Semantic Web! 4!
What does semantic mean? ! 4!
What is semantic web?! 6!
How to build a semantic web? ! 8!
Difference between original Web 2.0 and semantic web! 10!
Applications! 11!
Conclusion and the future! 14!
Bibliograph! 16
!!
!
!
2
3. Introduction!
In the book: “Connectome” , authored by MIT professor of
computational neuroscience, Sebastian Seung, he describes a trend that
has been haunting me since the first time I read the book. The vision as
follow:
At the beginning of science development, mathematics, physics
and chemistry shaped a world view of materialism. By this world view, we
interpret things as “a bunch of atoms.” Then by the advance of
mechanics, biology and neuroscience, we marveled about the intricate
machinery of working system. A mechanism world view is formed. We
started to interpret things as “machines.” Now, affected by the advance of
computer science and Internet, things now are “a bunch of information.”
Nowadays we are
inundated by information.
Countless data is being
generated, processed,
transferred, and stored. All these could happen
because of the advancement of computing power,
development of Internet infrastructure and protocols.
To move forward, we need to change. We have to
change the way we use internet. The classic web is
a “web of document”. Semantic web is a “web of
data.” The ultimate goal of the Web of data is to
enable computers to do more useful work and to
develop systems that can support trusted interactions over the network. What is it really? How to
make it happen?
3
4. Semantic Web!
What does semantic mean? !
“Semantic :of or relating to meaning in language” …..Merriam Webster
It is all about meaning. What is meaning? How could we have a meaning?
In spoken language, we define meaning, “the idea that is represented by a word, phrase, etc.”
First we have vocabularies, but they are not sufficient to represent intricate ideas. That’s the
reason we need grammar to combine words into sentences, and we could keep building complex
ideas under this framework. This is the process of representing idea in spoken language. We could
say, meaning is defined by combining entities by predefined syntactic rules.
To elaborate, let’s have thought experiment. Assume I don’t know much about English and all I
have is a Merriam Webster at hand. Say, now I am looking for the meaning of an unknown vocabulary.
Then I open the dictionary, find the world I am looking for, then its explanation pops out. To be more
analytical and skeptical, I am confused again, because the meaning of a word is defined by a
combination of couple words, and none of them I do really understand. So I could look up words in the
4
5. explanation and end up find even more unknown words. The process could go on indefinitely. Finally I
give up.
However, most of people are doing fine with dictionary, why am I having such trouble? This is
the classical philosophical problem: Can anyone learn a wholly new language only by dictionary?
Answer is no. If dictionary can’t give us a meaning of a word, what is a meaning anyway?
In the thought experiment, a dictionary for a
pure novice is meaningless, but the dictionary itself
represent the foundation of meanings in a language.
What is a dictionary? In it, each word is recursively
defined by others. Graphically speaking, a dictionary
is a graph that connect and represent relationships
between words. Dictionary is a web of
relationships. One word’s meaning is defined by it’s
relationships to others and it’s position in the graph.
Each language has a graph, and different
language’s graphs are like parallel universes. That’s
the reason why we can’t learn other language only
by dictionary unless we somehow possess enough
mapping between our native language to new
language. The mapping between two language
system is the bridge of parallel universes.
To have a meaning, we need a graph. Graph
is a prerequisite of meaning. Without it, nothing
could be semantic. This is the on going revolution to
the next generation about how do we organize
information in the world . We want to make the
internet more semantic so we could do things we
couldn’t even have imagined before. In context of
INTERNET, we already have a graph. The question next is how to make it semantic?
5
6. What is semantic web?!
Definition of Semantic Web:!
“A set of formats and languages that find and analyze data on the World Wide Web, allowing
consumers and businesses to understand all kinds of useful online information.” !
In order to understand what is semantic web, we have to keep the analogy to human spoken
language in mind. A graph is a combination of vertices and edges. Vertices in spoken language are
words. What is vertices in World Wide Web?
During the progress of past two decades, WWW has developed most rapidly as a medium of
documents for people. The classic example is WiKipedia, which stores over 7 millions articles on the
web. However, only vertices can’t form a graph. We need to connect those documents in a meaningful
way so each one could find its meaning on the web.
!
What does connection mean?!
Someone may argue, in the INTERNET, we are so connected. What do you mean we need to
connect documents again?
Well, connection exists in different forms. What INTERNET does to us is to connect nodes
around the world. Nodes means machines, but we don’t really care about machines. Machines
represent the ability to process data. The truth is: we care about data. Only data have meanings to
humans.
I am not saying physically connecting hosts is not important. Semantic web has to be build
upon physically connected architecture. The last two decades have laid down the foundation for
future’s development. The connections we need to create for future semantic web are logical
connections between informations on the web. !
Why do we want to build a semantic web?!
Most of the Web’s contents today are designed for humans to read, not for computer programs
to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine
processing, but in general, computers have no reliable way to process the semantics. The semantic
web will bring structure to the meaningful content of web pages, creating an environment where
software roaming from page to page can readily carry out sophisticated tasks for users.
The semantic web is not a separate web but an extension of the current one, in which
information is given well-defined meaning, better enabling computers and people to work in
cooperation. Make it an extension means we are not going to waste last 20 years handwork and start
all over again. By tweaking old web system, we could unleash the real potential. So tweaking old
system is only the beginning, the purpose is to enhancing people’s life.
6
7. Only by cooperation between human and machine we could efficiently enhance people’s life.
Imagine now you are making a travel plan for your family. Now you have two extreme options to
accomplish the task:
1. Delegate the task to a travel agent and she will bring the package to you later.
2. Design and build a super powerful AI and use it.
!
The first option will solve the task but we can do better. First, good agent could be efficient but
doesn’t scale well. The size of workload doesn’t shrink. Moreover, travel agent may know you budget
and possible time table but he doesn’t know your life style, you family’s life style, preferences and so
on. These criteria are essential for a wonderful travel experience.
Although the second option seems to excluded human in decision loop so the whole process
could be completed automatically, it is more unrealistic than first one for now. Nowadays all expert
system and AI simulation could only be hold on supercomputers with carefully calibrated application
by a crew of experienced professionals. This is money, so much money.
So by carefully structure documents and websites, an authorized program could check on your
family’s schedule, roam through different travel websites, airplane and hotel websites and leave those
hard decision to you. For computer, quantitative problem is easy, but qualitative problem is hard, but
for you happen to be opposite. Force computer to make qualitative decision would be very inefficient
for now because current computer architecture is built to do mathematical computation and that is the
reason why pure AI may not be too efficient to solve human problem. For this instance, neither AI or
real travel agent could answer the question for your family ,like which is better, Greek or France?
Semantic web want to provide a chance for meaningfully connecting discrete documents on
the web and a chance for efficient human / computer cooperation. And in a era of information
explosion, we do really need some help. Manually parsing data, find patterns ,understand and react
seem to be outmoded in this generation.
7
8. How to build a semantic web? !
To build a semantic web is similar to build a language. Any language has 3 most distinctive building
blocks.
1. basic elements
2. ways of combination
3. ways of abstraction
The basic elements in the semantic web are documents in countless websites. We already
have many today. Let me elaborate on ways of combination. This is the essence of transforming
documents into a web of useful data.
!
Ways of combination!
To formalize how documents could connect with each other, actually what we are trying to do
is to build a language to describe relationships between any two document. We want to build a
language to describe relationships between any two information entities. Actually the overall goal is
the same as building a human language without emotion part. Why do we need a new language?
In human world, to cooperate well, you have to take benefits of both parties into consideration.
Here is the same principle, we need computer to do more for us so that humans could skip those
boring searching and parsing part to what we really care and make crucial decisions. To achieve, we
need a new language designed with machine in heart.
The World Wide Web Consortium has been working very hard to promote and standardize this
“semantic web unified language.” The architecture of such language is three layer hierarchy,
explained from low level to high.
• RDF Format
• Ontology Languages
• Inference Engines !
RDF - Resource Description Framework!
The most fundamental building block is Resource Description Framework (RDF), a format for
defining information on the Web. Each piece of data, and any link that connects two pieces of data, is
identified by a unique name called a Universal Resource Identifier, or URI. (URLs—the common Web
addresses that we all use, are special forms of URIs.) In the RDF scheme, two pieces of information
are grouped together into what is called a triple.
URIs can be agreed on by standards organizations or communities or assigned by individuals.
The relation “is a” is so generally useful, for example, that the consortium has published a standard
URI to represent it. The URI “http://en.wikipedia.org/ wiki/Dolphin” could be used by anyone working
on RDF to represent the concept of dolphin. In this way, different people working with different sets of
information can nonetheless share their data about dolphins and television animals. And people
everywhere can merge knowledge bases on large scales. !
8
9. Ontology Language!
Individuals or groups may want to define terms and data they frequently use, as well as the
relations among those items. This set of definitions is called an ontology. Ontologies can be very
complex (with thousands of terms) or very simple. Web Ontology Language (known as OWL) is one
standard that can be used to define ontologies so that they are compatible with and can be
understood by RDF. !
Inference Engines!
Ontologies can be imagined as operating one level above RDF. Inference engines operate one
level above the ontologies. Software programs examine different ontologies to find new relations
among terms and data in them. For example, an inference engine would examine the three RDF
triples below and deduce that Flipper is a mammal. Finding relations among different sources is an
important step toward revealing the “meaning” of information.
So in general, the RDF names each item and the relations among items in a way that allows
computers and softwares to automatically interchange the information. Additional power comes from
ontologies and other technologies that create, query, classify and reason about those relations. For
example:
• SPARQL, a query language that allows applications to search for specific information within RDF
data.
• GRDDL, which allows people to publish data in their traditional formats, such as HTML or XML,
and specifies how these data can be translated into RDF. !
A more refined hierarchy could be represented as follow:
9
10. Difference between original Web 2.0 and semantic
web!
Just as the HTML and XML language have made the original Web robust, the RDF language
and the various ontologies based on it are maturing. Here are some applications by which we start to
appreciate the power of semantic web technology.
10
11. Applications!
Knowledge graph!!
If you want to understand the state of ongoing neural network simulation around the world, the
chance is such information is distributed across different websites. So you have to google one
keyword, find something and google more keywords. After so many iterations of search and research,
you mat start to have a clearer picture. Can we do better?
Like humans, data actually relates to each other. Try Google knowledge graph. When you
search something, it returns a linked data graph. This would be much more efficient to repeatedly
search and research. In the graph you could have a grand vision.
By using the idea and technology of semantic web, users don’t have to depend on google to
build a graph, and developers don’t have to wait for google to open API. Like openstreetmap.org,
each person could contribute to map building. And if everyone does its little part, the quality of
aggregated map will not be inferior than google map. We could create our knowledge graph and enjoy
the power of semantic web just by every participant does his/her little part.
Information Verification!
Verification process of information could be enhanced too. How to verify the basic correctness
of information online? Most of time, we don’t. We choose to believe, relying hugely on big brand and
believe blindly. Even you do want to verify the information, the cost is very high.
In the context of semantic web, if most of user carefully calibrate the ontologies and inference
engine right, verification is not a peer to peer paradigm anymore. We could verify information in a
network just like we did peer review before publishing papers. This distributed verification would be
more efficient than in Web 2.0. !
11
12. Social network!
Nowadays, everyone seems to have joined one or more social networks. I personally use
Facebook, Twitter, Google+, Weibo, WeChat, Line, Instagram and so on. When you post pictures,
chat with friends, share informations, your personality seem to be divided by the walls between
different social networks because data in different networks can not be integrated and connected. All
data you create by using any social network service is the image or your online personality. Why
should it be divided?
In the Friend of Friend project (FOAF), a data language and ontologies is being used and
applied well. It created a vocabulary that describes the personal information, by which users could
decide what to post and finds common interests with each other.
The basic idea behind FOAF is simple: the Web is all about making connections between
things. FOAF provides some basic machinery to help us tell the Web about the connections between
the things that matter to us. Thousands of people already do this on the Web by describing
themselves and their lives on their home page. Using FOAF, you can help machines understand your
home page, and by doing so, programs could learn about the relationships that connect people,
places and things described on the Web. FOAF uses W3C's RDF technology to integrate information
from your home page with that of your friends, and the friends of your friends. !
Drug Discovery!
This is a very good example of how human can work with machines to deliver better results for
personalization of drugs.
Two challenges:
1. Each person’s unique information.
• Genes
• Physical environment
• Emotional environment
2. Rapidly dynamically changing medical knowledge and divided database.
How to meld a bewildering area of data set like: historic and current dedicate records per
person + scientific reports on a number of drugs + drug tests + potential side effects and outcome
from other patients? This is a decision about a person’s health, and all data above have to be
considered and this is why personalized medication is not a reality yet. What semantic web
technology could help?
12
13. A research team at Cincinnati Children’s Hospital Medical Center is leveraging semantic
capabilities to find the underlying genetic causes of cardiovascular diseases.
Began by downloading into a workstation the databases that held relevant information but from
different origins and in incompatible formats. These databases included Gene Ontology (containing
data on genes and gene products), MeSH (focused on diseases and symptoms), Entrez Gene (gene-centered
in- formation) and OMIM (human genes and genetic disorders). The investigators translated
the formats into RDF and stored the information in a Semantic Web database. They then used
Protégé and Jena, freely available Semantic Web software from Stanford University and HP Labs,
respectively, to integrate the knowledge.
The researchers then prioritized the hundreds of genes that might be involved with cardiac
function by applying a ranking algorithm somewhat similar to the one Google uses to rank Web pages
of search results. They found candidate genes that could potentially play a causative role in dilated
cardiomyopathy, a weakening of the heart’s pumping ability. The team instructed the software to
evaluate the ranking information, as well as the genes’ relations to the characteristics and symptoms
of the condition and similar diseases. The software identified four genes with a strong connection to a
chromosomal region implicated in dilated cardiomyopathy. The researchers are now investigating the
effects of these genes’ mutations as possible targets for new therapeutic treatments.
This job used to be done only by humans. In the traditional research process, computer and
database provide very limited query function. Because each database are somehow divided by its
formats, so researchers have to pore through 4 or 5 databases, trying to discern possible candidates.
This is a painstaking task obviously. With help of semantic web technology, searching and cross
referencing database could be done be computers.
Together, we could move and grow faster and more efficient.
13
14. Conclusion and the future!
In order to grasp the gist of semantic web, we need one more thought experiment
!
Imagine we live in a world with millions of people, but we don’t have spoken and written
language. People still have daily routines, see and touch different things, develop tools to facilitate
their works, get inspired and have some ideas. However, without language, we could communicate
only by voice, visual and sequence of behaviors. We don’t learn by reading, but by experiencing,
doing, and seeing real things.
Now we install spoken/written language into this world. What is changed? Before we have a
language, we I want to express the idea of “apple” to you, I have no choice but to find a real apple and
show you, but we both know what is apple by heart. By the first time we saw apple or maybe we even
tastes it, we already sample the idea of “apple” in our brain. The real apple in the pre-language
world is acting as a pointer to our idea of apple in mind. Communication means I want you to feel
what I felt, see what I saw, experience what I experienced in the idea level. Without language, we are
so limited by the source of pointers because finding a real thing or replay what just happened may be
not feasible.
Language gives people power because we provide them a more efficient source of pointers
and ways to combine them to represent ideas. When you see “apple”, this word, I don’t have to show
you a real apple and you know what I am talking about.
In order to communicate, compute and processing data in human semantic ways, pure
existence of information is not sufficient, we need a efficient way to link different information together.
Language is the answer. Semantic web tries to formalize a language by which we could relate and
connect information on countless websites, but this time we don’t do this purely for human, we do this
for machine in such a way they could in return help us achieve things we couldn’t achieve without
linked data.
14
15. The point goes beyond only linking documents on websites. Human culture is hitting a tipping
point that we are promoting computers’ role in our decision making process. We need them to do
more for us, more actively. So by creating an unified semantic web language, we could bridge the
human knowledge system to still growing computing power around the world. And the power of this
synergy is just about to explode. Connections are not limited only to documents or databases, but
could reach to physical stuff in our world. In next two decade, everything would be connected to the
internet from light bulbs to your glass, and the movement of semantic web will direct this connection
so that computer has a role in it. Applications? Beyond imagination right now. Convenience, maybe.
Ethical, privacy, control issue? Definitely.
Grand visions rarely progress exactly as planned, but the semantic web is indeed emerging
and making online information more useful than ever.
Go back the this question: What’s the point of all that connections and data, anyway? If
we couldn’t make people’s life easier and better, why do we need engineering? Here we could choose
to believe that by this approach humans could do better. Moreover, we could actively participate and
make sure things really get better.
!!
15
16. Bibliograph!
• http://en.wikipedia.org/wiki/Semantic_Web!
• http://www.w3.org/standards/semanticweb/ !
• http://www.w3.org/DesignIssues/LinkedData.html!
• http://rdfa.info/!
• http://www.foaf-project.org/!
• http://en.wikipedia.org/wiki/Lingustics!
• http://microformats.org/!
• Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). The Semantic Web.
Scientific American !
• John F. Sowa: Principles of semantic networks. Explorations in the representation of
knowledge, Morgan Kaufmann, San Mateo, Cal. 1991, ISBN 1-55860-088-4.!
• G.W. Flake, D.M. Pennock, and D.C. Fain, “The Self-Organized Web: The Yin to the
Semantic Web’s Yang,” IEEE Intelligent Sys- tems, July/Aug. 2003, pp. 72–86.
16