Web of data


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web of data

  1. 1. Web of Data Rajendra Akerkar Western Norway Research Institute Sogndal, Norway
  2. 2. WWW & Society Social contacts (social networking p ( g platforms, blogging, ...) , gg g, ) Economics (buying, selling, advertising, ...) Administration (eGovernment) Education (eLearning, Web as information system, ...) Work life (information gathering and sharing) Recreation (games, role play, creativity, ...) R. Akerkar 2
  3. 3. Limitations of the Current Web Too much information with too little structure  and made for human consumption  Content search is very simplistic  future requires better methods Web content is heterogeneous  in terms of content  in terms of structure  in terms of character encoding  Future requires intelligent information integration q g g  Humans can derive new (implicit) information from given pieces of information but on the current Web, we can only deal with syntax, requires automated reasoning techniques R. Akerkar 3
  4. 4. Data Integration on the Web Data integration on the Web refers to the process of combining and aggregating information resources on the Web so they y could be collectively useful to us. Goal  for a given resource (say a person an idea an (say, person, idea, event, or a product), we would like to know everything that has been said about it. R. Akerkar 4
  5. 5.  Myself as the resource resource. Assume we have already built a “smart” smart agent,which will walk around the Web  to find everything about me me. R. Akerkar 5
  6. 6.  To get our smart agent started we feed it started, with the URL of my personal home page  http:www.tmrfindia.org/ra.html Agent downloads this page and tries to collect information from this page R. Akerkar 6
  7. 7. Web page - a traditional Web document Our agent is able to understand HTML language constructs,  <p>, <br>, <href>, <table> and <li> R. Akerkar 7
  8. 8. Web page – non-traditional Webdocument besides the HTML constructs, it actually contains some “statements”  These statements follow the same simple structure  each one of them represents one aspect of the given resource ns0:RajendraAkerkar ns0:name ”Rajendra Akerkar". ns0:RajendraAkerkar ns0:title ”Professor". ns0:RajendraAkerkar ns0:author <ns0: x> <ns0:_x>. ns0:_x ns0:ISBN "978-1-84265-535-1". ns0:_x ns0:publisher <http://www.alphasci.com>. R. Akerkar 8
  9. 9. Namespace - a mechanism for abbreviating URIs ns0 represents a namespace, so that we know everything, with ns0 as its prefix, is collected from the same Web page. ns0:RajendraAkerkar represents a resource that is described by my Web page;in thi case, thi resource i me. i this this is So,  resource 0 R j d Ak k reso rce ns0:RajendraAkerkar has a ns0:name property whose value is RajendraAkerkar R. Akerkar 9
  10. 10.  2nd statement claims the ns0:title property of resource ns0:RajendraAkerkar has a value givenby Professor. 3rd statement is unusual.  When specifying the value of ns0:author p p y for resource p y g property ns0:RajendraAkerkar, instead of using a simple character string as its value, it uses another resource, and this resource is identified by ns0:_x. To make this fact more obvious, ns0: x i i l d d b <> 0 is included by <>. 4th statement specifies the value of ns0:ISBN property of resource ns0:_x the l t t t th last statement specifies the value of ns0:publisher t ifi th l f property of the same resource.  the value of this property is not a character sting, but another resource identified by htt // http://www.alphasci.com. l h i R. Akerkar 10
  11. 11. How much does our agent understandthese statements? Agent organizes them into a graph R. Akerkar 11
  12. 12. A graph generated by agent after visitingWeb page ”Rajendra ns0:name Akerkar". ns0:title Professor ns0:RajendraAkerkar 8@ akerkar8@gmail.c om ns0:e-mail ns0:author ns0:homepage Http://www.tmrfindia.org/ ra.html ns0:ISBN 978-1-84265-535-1 ns0:title Foundations of the Semantic Web: XML, ns0:publisher RDF & Ontologies http://www.alphasci.com R. Akerkar 12
  13. 13. Agent hits another Web page www.amazon.com Existing amazon: agent doesn’t know how to retrieve information about ISBN number New amazon: agent can collect statements, such as ns1:book-1842655353 ns1:ISBN "978-1-84265-535-1". ns1:book 1842655353 ns1:book-1842655353 ns1:price USD 68.80. ns1:book-1842655353 ns1:customerReview "4 star". Similar to namespace prefix ns0,  ns1 represents another names-pace prefix. 1 t th fi Graph? R. Akerkar 13
  14. 14. The graph generated by agent after visitingAmazon.com R. Akerkar 14
  15. 15. Obvious fact for us ns0: x,as a resource represents exactly the same p y item denoted by the resource named ns1:book-1842655353 Observation:  a person who has a home page with its URL given by http://www.tmrfindia.org/ra.html h a b k // i i / has book published and the latest price of that book is US $68.80 on Amazon.  Fact i F t is not explicitly stated on either one of the Websites, t li itl t t d ith f th W b it but we have integrated the information to reach this conclusion. R. Akerkar 15
  16. 16. Agent does data integration makes a connection between two appearances of ISBN in two different sets of statements It will then automatically add the following new statement to its original statement collection: ns0:_x sameAs ns1:book-1842655353 This process is exactly the data integration process on the Web Graph? R. Akerkar 16
  17. 17. R. Akerkar 17
  18. 18. What agent can do? answer lots of questions that we might have For example, example  what is the price of the book written by a person whose home page is given by URLURL, http:www.tmrfindia.org/ra.html R. Akerkar 18
  19. 19. Yet another attempt Let us say now our agent hits www.linkedIn.com. If LinkedIn were still the LinkedIn today, our y, agent could not do much. However, assume LinkedIn is a new LinkedIn and our agent is able to collect quite a few statements from this Web site. ns2:RajendraAkerkar ns2:email ”akerkar8@gmail com". akerkar8@gmail.com ns2:RajendraAkerkar ns2:companyWebsite "http://www.vestforsk.no". ns2:RajendraAkerkar ns2:connectedTo <ns2:Jacques>. Graph? R. Akerkar 19
  20. 20. A graph generated by agent after visitinglinkedIn.com ns2:Professor ns2:currentJob ns2:RajendraAkerkar http://www.vestforsk.no ns2:companyWebsite ns2:address ns2:email ns2:connectedTo akerkar8@gmail.com ns2:Norway ns2:country ns2:Jacques R. Akerkar 20
  21. 21.  We know ns0:RajendraAkerkar and ns2:RajendraAkerkar represent exactly the same resource, because both these two resources have the same e-mail address. For our agent, just by comparing the two identities (ns0:RajendraAkerkar vs. ns2:RajendraAkerkar)does not ( 0 R j d Ak k 2 R j d Ak k )d t ensure the fact that these two resources are the same. However, if we can “teach” our agent the following fact:  If the e-mail property of resource A has th same value as th e- th il t f h the l the mail property of resource B, then resources A and B are the same resource.  Then our agent will be able to automatically add the following new statement to its current statement collection:  ns0:RajendraAkerkar sameAs ns2:RajendraAkerkar. R. Akerkar 21
  22. 22.  With the creation of this new statement our statement, agent has in fact integrated graphs by overlapping nodes pp g Now, agent will be able to answer more questions:  What is Rajendra’s company website?  How much does it cost to buy Rajendra’s book? Rajendra s  Which country does Rajendra live in? Agent answers using integrated graph R. Akerkar 22
  23. 23. Automatic data integration Obviously, the set of questions that our agent is able to answer grows by hitting more Web documents. We can continue to move onto another Web site so as to add more statements to our agent’s collection. Automatic data integration on the Web can be quite powerful and can help us a lot when it comes to information discovery and retrieval. R. Akerkar 23
  24. 24. Smart Data Integration Agent The Web and the agent The W b Th Web – change f h from it t diti its traditional form lf  Each statement collected by our agent represents a piece of knowledge (a model to represent knowledge on the Web)  Such model of representing knowledge has to be easily and readily processed (understood) by machines.  This model has to be accepted as a standard by all the Web sites (share a common pattern).  Way to create such statements (manually or automatically)  The statements contained in different Web sites can not be completely arbitrary (e.g., to describe a person, we have some common terms such as name, birthdate, and home page)  Agreement on common terms and relationships A new breed of Web …!!! R. Akerkar 24
  25. 25. Smart Data Integration Agent Agent - new agent Agent has to be able to understand each statement that it collects. By understanding the common terms and relationships that are used to create these statements. Agent has to be able to conduct reasoning based on its understanding of the common terms and relationships.  For example, knowing the fact that resources A and B have the same e-mail example e mail address and considering the knowledge expressed by the common terms and relationships, it should be able to conclude that A and B are infact the same resource. Agent should be able to process some common queries that are submitted against thestatements it has collected. Some more to be included ... R. Akerkar 25
  26. 26. The Idea of the Semantic Web The Semantic Web provides the technologies and standards that we need to make the following p g possible:  adds machine-understandable meanings to the current Web, so that  computers can understand the Web documents and therefore can automatically  accomplish tasks that have been otherwise conducted manually, on a large scale. R. Akerkar 26
  27. 27. Idea of the Semantic Web The Semantic Web provides the technologies and standards that we need to make our agent possible A brand new layer built on top of the current Web, and it adds machine understandable meanings (or “semantics”) to the current Web. “ ti ”) t th tW b The Semantic Web is certainly more than automatic data integration on a large scale. R. Akerkar 27
  28. 28. What is the Semantic Web? The Semantic Web: … content that is meaningful tog computers [and that] will unleash a revolution of new possibilities … Properly designed, the Semantic Web can assist the evolution of human knowledge …” Tim Berners-Lee, …, Weaving the Web The semantic Web is supposed to make data located anywhere on the web accessible and understandable, both to b th t people and machines. Thi i more a vision l d hi This is i i than a technology. R. Akerkar 28
  29. 29. The Web as visioned by Tim Tim Berners-Lee has a two-part vision for the Berners Lee two part future of the web: o The first part is to make the web a more p collaborative medium. o The second part is to make the web understandable and thus processable by machines. R. Akerkar 29
  30. 30. The Web as visioned by Tim Tim Berners‐Lee’s original diagram of his vision R. Akerkar 30
  31. 31. The change between current Web and the Semantic Web? gResources: identified by URI s URIs untypedLinks: href, src, ... limited, non-descriptiveUser: Exciting world - semantics of the resource, however, gleaned from contentMachine: Very little information available - significance of il bl i ifi f the links only evident from the context around the anchor. Current Web R. Akerkar 31
  32. 32. The change between current Web and the Semantic Web?Resources: Globally Identified by URIs y p ( or Locally scoped (Blank) ) Extensible  RelationalLinks: y Identified by URIs  Extensible  Relational User: g Even more exciting world,  , richer user experienceMachine: More processable  information is available  (Data Web)Computers and people: Work, learn and exchange  g knowledge effectivelyy Semantic Web R. Akerkar 32
  33. 33. A Layered Approach y pp  The development of the Semantic Web proceeds in steps  Each step building a layer on top of another Principles:  Downward compatibility  U Upward partial understanding d ti l d t di33 Chapter 1 R. Akerkar A Semantic Web Primer 33
  34. 34. The Semantic Web in W3C’s view34 Chapter 1 R. Akerkar A Semantic Web Primer 34
  35. 35. An Alternative Layer Stack y  Takes recent developments into account  The main differences are: − The ontology layer is instantiated with two alternatives: the current standard Web ontology language, OWL, and a rule- based language − DLP is the intersection of OWL and Horn logic, and serves as a g common foundation  The Semantic Web Architecture is currently being debated and may be subject to refinements and modifications in the future.35 Chapter 1 R. Akerkar A Semantic Web Primer 35
  36. 36. Alternative Semantic Web Stack36 Chapter 1 R. Akerkar A Semantic Web Primer 36
  37. 37. Semantic Web Layers  XML layer  Syntactic basis  RDF layer y  RDF basic data model for facts  RDF Schema simple ontology language  Ontology layer  More expressive languages than RDF Schema  Current Web standard: OWL37 Chapter 1 R. Akerkar A Semantic Web Primer 37
  38. 38. Semantic Web Layers (2) y ( )  Logic layer  enhance ontology languages further  application-specific declarative knowledge  Proof layer  Proof generation, exchange, validation  Trust layer  Digital signatures  recommendations, rating agencies ….38 Chapter 1 R. Akerkar A Semantic Web Primer 38
  39. 39. Semantic Web Challenges The Web is distributed  many sources, varying authority  inconsistency The Web is dynamic  representational needs may change The Web is enormous  systems must scale well The Web is an open-world R. Akerkar 39
  40. 40. References R. Akerkar, Foundations of the Semantic Web, Narosa Publishing House, New Delhi and Alpha Science Intern., London, ISBN-978-81- 7319-985-1. Berners-Lee T.,Hendler J., Lassila O. (2001) The Semantic Web. SciAm 284(5):34 43 284(5):34–43 Liyang Yu, A Developer’s Guide to the Semantic Web, Springer, ISBN 978-3-642-15969-5 Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph. Foundations of Semantic Web Technologies, CRC Press/Chapman and Hall (2009) http://www.w3.org/2001/sw/SW-FAQ http://www.w3.org/2001/sw/ R. Akerkar 40