Lecture 1 for the course Semantic Web Technologies (presented at Free University of Bozen Bolzano, 2013)

  • We need A data modelA query languageStandards and tools to publish the dataStandards and tools to consume the data
  • Big players are betting on this
    • 1. + Semantic Web Technologies 2012-2013 Part I Mariano Rodriguez-Muro, Free University of Bozen-Bolzano
    • 2. + Disclaimer License This work is licensed under the Creative Commons Attribution-Share Alike 3.0 License
    • 3. + Intro  Course organization  Intro to Semantic Web  Intro to Semantic Technologies
    • 4. + Course organization
    • 5. + About me Mariano Rodríguez-Muro Assistant Professor at KRDB Faculty of computer Science (POS Building, 202) Tel. +390471016228 rodriguez =at= Research interests:  Techniques for query answering optimization  SPARQL, Big RDFS, virtual RDF  Data integration with Semantic Tech and SemTech in the enterprise.
    • 6. + About you  Which program?  Which semester?  Why are you here?   Topic relates to my area  Looking for project/thesis?  Just Interesting?   Topic is mandatory Need some credits? Special interests?
    • 7. + Course organization (Part I)  Website:   Moodle   … Schedule     Lecture: Tuesday:10:30 am to 12:30 pm Lecture: Thursday 8:30 am to 10:30 am Lab: Tuesday 2:00 to 4:00 pm Office Hours   With appointment Please use forums as main means of comunication
    • 8. + Reference Material  Slides, Papers  Foundations of Semantic Web. Pascal HItzler, Markus Krotzsch and Sebastian Rudolph. Chapman & Hall/CRC, 2010. (Code FSW)  Semantic Web Programming. John Hebeler et. al. Wiley. 2009. (Code SWP)  Programming the Semantic Web. Toby Segaran, Colin Evans and Jamie Taylor. O‟Reilly. 2009. (Code PTSW) Available at the library. SWP and PTSW available as ebooks.
    • 9. + Grading  Part I 50%, Part II 50%  Grading Part I   Lab exercises: 15% Mid-term: 35%  Exercises: Each week a new assignment. All assignments are graded. All assignments are mandatory. Delivery must be done by the next week. Java and SQL/JDBC is required. Projects must be packaged with Maven.  Midterm. Covers all material seen during the lectures. From slides, presentation and selected book chapters/readings (marked at the end of each slide)
    • 10. + Introduction Semantic Web
    • 11. + Web of Documents  Primary objects: documents  Degree of structure in data: low  Semantics of content:Implicit  Designed for: human consumption Links between documents
    • 12. + Web of documents: The problem
    • 13. + Example: Elvis
    • 14. + Web of data: The problem  How about this query:   How many romantic comedy Hollywood movies are directed by a person who is born in a city that has average temperature above 15 degrees!? You need to:   Find reliable sources containing facts about movies (genre & director), birthplaces of famous artists/directors, average temperature of cities across the world, etc.  The result: several lists of thousands of facts Integrate all the data, join the facts that come from heterogeneous sources Even if possible, it may take days to answer just a single query!
    • 15. + The Vision I have a dream for the Web in which computers become capable of analyzing all the data on the Web - the content, links, and transactions between people and computers. A Semantic Web, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The intelligent agents people have touted for ages will finally materialize. Barners-Lee, 1999
    • 16. + The semantic web  Primary objects: things Links between: things  Degree of Structure: high  Explicit semantics of contents and links  Designed for both machines and humans
    • 17. + Web of data
    • 18. + Semantic Technologies
    • 19. + Not only about the web  The semantic web vision has generated technologies that are applied outside the web context including:   Government  Research (Bio, Geo, Cultural heritage, etc.)  Software development   Enterprise intelligence … Semantic technologies provide flexible and powerful tools to accomplish things that were not possible or not practical in the past.
    • 20. + 22 Introduction to the Semantic Web approach How does a Semantic Web approach help us merge data sets, infer new relations, and integrate outside data sources?
    • 21. + 23 The rough structure of data integration with SWT Map the various data onto an abstract data representation 1. • Make the data independent of its internal representation… 2. Merge the resulting representations 3. Start making queries on the whole • Queries not possible on the individual data sets
    • 22. + Data set “A”: A simplified book store Books ID Author ISBN0-00-651409-X id_xyz Authors ID Title The Glass Palace Name id_xyz Ghosh, Amitav Publishers ID Harper Collins id_qpr Home page Publisher Name id_qpr Publisher City London Year 2000 24
    • 23. + 25 1st: Export your data as a set of relations
    • 24. + 26 Some notes on the data export Data export does not necessarily mean physical conversion of the data Relations can be virtual, generated on-the-fly at query time via SQL “bridges” scraping HTML pages extracting data from Excel sheets etc. One can export part of the data
    • 25. + 27 Data set “F”: Another book store‟s data A 1 7 11 12 13 D E ID ISBN0 2020386682 Traducteur Titre Original Le Palais A13 ISBN-0-00-651409-X des miroirs ID ISBN-0-00-651409-X Auteur A12 2 3 6 B Nom Ghosh, Amitav Besse, Christianne
    • 26. 2nd: Export your second set of data + 28
    • 27. 3rd: start merging your data + 29
    • 28. 3rd: start merging your data (cont‟d) + 30
    • 29. 4th: Merge identical resources + 31
    • 30. + 32 Start making queries…  User of data set “F” can now ask queries like:  “What is the title of the original version of Le Palais des miroirs?”  This information is not in the data set “F”...  …but can be retrieved after merging with data set “A”!
    • 31. 5th: Query the merged data set + 33
    • 32. + 34 However, more can be achieved…  We “know” that a:author and f:auteur are really the same  But our automatic merge does not know that!  Let us add some extra information to the merged data:  a:author is equivalent to f:auteur  Both identify a Person, a category (type) for certain resources  a:name and f:nom are equivalent to foaf:name
    • 33. 3rd revisited: Use the extra knowledge + 35
    • 34. + 36 Start making richer queries!  User of data set “F” can now query:  “What is the home page of Le Palais des miroirs’s „auteur‟?”  The information is not in data set “F” or “A”…  …but was made available by:  Merging data sets “A” and “F”  Adding three simple “glue” statements
    • 35. 6th: Richer queries + 37
    • 36. + 38 Bring in other data sources  We can integrate new information into our merged data set from other sources   e.g. additional information about author Amitav Ghosh Perhaps the largest public source of general knowledge is Wikipedia  Structured data can be extracted from Wikipedia using dedicated tools May 12, 2009
    • 37. 7th: Merge with Wikipedia data + owl:sameAs 39
    • 38. 7th (cont‟d): Merge with Wikipedia data + owl:sameAs 40
    • 39. 7th (cont‟d): Merge with Wikipedia data + owl:sameAs 41
    • 40. + 42 Is that surprising?  It may look like it but, in fact, it should not be…  What happened via automatic means is done every day by Web users!  The difference: a bit of extra rigour so that machines could do this, too
    • 41. + 43 What did we do?  We combined different data sets that   ...are of different formats (RDBMS, Excel spreadsheet, (X)HTML, etc)   ...may be internal or somewhere on the Web ...have different names for the same relations We could combine the data because some URIs were identical  i.e. the ISBNs in this case  We could add some simple additional information (the “glue”) to help further merge data sets  The result? Answer queries that could not previously be asked
    • 42. + 44 What did we do? (cont‟d)
    • 43. + 45 The abstraction pays off because…  …the graph representation is independent of the details of the native structures  …a change in local database schemas, HTML structures, etc. do not affect the whole  “schema independence”  …new data, new connections can be added seamlessly & incrementally  … it doesn‟t matter if you are at the enterprise level or at the web level
    • 44. + 46 So where is the Semantic Web? Semantic Web technologies make such integration possible
    • 45. + Semantic Technologies Today: Applications, Use cases, Technologies, Systems
    • 46. + Web of data today
    • 47. + Semantics today  Linked-in   Good-relations  Oracle (Server)  IBM (DB2, Watson)  Apple (Siri)  SAP  Evri, Linked-in, many startups  Many deployed systems
    • 48. + Semantic Web Technologies  A set of technologies and frameworks that enable semantic data management, data integration and the web of data  Resource Description Framework (RDF)  A variety of data interchange formats (e.g., RDF/XML, N3, Turtle, NTriples)  Semantic languages such as RDF Schema (RDFS) and the Web Ontology Language (OWL) and Rules (SWRL)  Query language (SPARQL)  Software infrastructure (RDF/SPARQL frameworks, Triple stores, Data integrators, Query engines, Reasoners)  Publicly available connected dataset and open data initiatives (LOD)
    • 49. + SWT Part I  The Data Model (RDF)  The query language (SPARQL)  Software Development (Architecture, Frameworks and Tools)  A little more semantics (RDFS, inference techniques, tools and data integration)  Interacting with the enterprise (Legacy sources, XML, DBMS, mappings)  More complex semantics (Rules, data integration and reasoning with rules)
    • 50. + Reading material  PTSW Chapter 1  SWP Part I, Chapter 1  FTW Section 1.4