Linked Data and the Semantic Web - Mimas Seminar


Published on

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Screenscraping, Google algorithms, but still not ideal
  • Context of openess – MPs expenses etc
  • The Central Office of Information The Central Office of Information (COI) is the Government's centre of excellence for marketing and communications.
  • site, which simply provides access to raw data (Excel spreadsheets, PDF files, and more), the UK is adhering closely to Berners- Lee’s Linked Data rules and making data available in formats such as RDF where feasible.
  • Next few slides from demos at launch Jan 21 st 2010
  • This is about open data, not linked data
  • Newspaper
  • Principles underpinning the technology
  • Step back a bit to HTML HTML web of documents doesn’t encourage re-use, reduce redundancy. Are network effects but could be much better.
  • Note this is a considerable simplification of the detail in danger of misleading. Linked data exploits semantically meaningful tagging to encourage re-use, reduce redundancy etc.
  • Uses predicate logic. Goes back to Aristotle. Conceptualises things, and the relationships between things
  • SparqPlug (Coetzee,HeathandMotta,2008) is a servicethatenables the extraction of Linked Data from legacy HTML documents on the Web that do not contain RDF data. The service operates by serialising the HTML DOM as RDF and allowing users to define SPARQL queries that transform elements of this into an RDF graph of their choice
  • D2R - Using a declarative mapping language, the data publisher defines a mapping between the relational schema of the database and the target RDF vocabulary. Based on the mapping, D2R server publishes a Linked Data view over the database and allows clients to query the database via the SPARQL protocol.
  • Just as traditional Web browsers allow users to navigate between HTML pages by following hypertext links, Linked Data browsers allow users to navigate between data sources by following links expressed as RDF triples. Linked Data search engines provide keyword-based search services oriented towards human users, and follow a similar interaction paradigm to existing market leaders such as Google and Yahoo.
  • Dots indicate provenance The colour of the dots doesn’t seem to be of significance
  • Falcons provide a more detailed interface to the user that exploits the underlying structure of the data. Both provide a summary of the entity the user selects from the results list, alongside additional structured data crawled from the Web and links to related entities. Falcons provides users with the option of searching for objects, concepts and documents, each of which leads to slightly different presentation of results.
  • Sindice (Oren et al, 2008) and Watson () provide APIs through which Linked Data applications can discover RDF documents on the Web that reference a certain URI or contain certain keywords. The rationale for such services is that each new Linked Data application should not need to implement its own infrastructure for crawling and indexing all parts of the Web of Data of which it might wish to make use. Instead, applications can query these indexes to receive pointers to potentially relevant documents which can then be retrieved and processed by the application itself.
  • Was covered at CETIS conference in 2009 I’d be interested to get any ideas on this
  • Linked Data and the Semantic Web - Mimas Seminar

    1. 1. UKOLN is supported by: Linked Data and the Semantic Web - What are they and should I care? 17th February 2010 MIMAS Discussion Forum University of Manchester, UK Adrian Stevenson
    2. 2. <ul><li>semantics is … devoted to the study of meaning … on the syntactic levels of words, phrases, sentences </li></ul><ul><li> </li></ul>
    3. 3. <ul><li>“ The Semantic Web is a web of data , in some ways like a global database” 1 </li></ul><ul><li>“ first step is putting data on the Web in a form that machines can naturally understand...  This creates what I call a Semantic Web - a web of data that can be processed directly or indirectly by machines” 2 </li></ul><ul><li>1. </li></ul><ul><li>2. Tim Berners-Lee, Weaving the Web . Harper, San Francisco. 1999. </li></ul>
    4. 4. <ul><li>“ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.” </li></ul><ul><li>“ the Semantic Web is the goal or end result… Linked Data provides the means to reach that goal” </li></ul><ul><li>From ‘ Linked Data: The Story So Far ’ - Heath, Bizer and Berners-Lee 2009 </li></ul>
    5. 5. The Web We’re Used To <ul><li>Made by humans for humans </li></ul><ul><li>Primarily documents </li></ul><ul><li>Machines not very welcome </li></ul><ul><li>Data silos </li></ul>
    6. 6. Web of Linked Data <ul><li>In 1998 the idea from Tim Berners-Lee of ‘linked data’ took shape </li></ul><ul><li>Designed for machines first </li></ul><ul><li>It primarily links data about ‘things’, not documents </li></ul><ul><li>… but it is for humans in the end </li></ul>
    7. 7. <ul><li>But haven’t we been putting data on the web for years? </li></ul><ul><ul><li>In CSV , relational databases, XML etc? </li></ul></ul><ul><li>Well yes, but these approaches are not so easy to integrate </li></ul><ul><li>Web 2.0 mashups work against a fixed set of data sources </li></ul><ul><li>Linked Data applications operate on top of an unbound, global data space. </li></ul>
    8. 8. So what’s happening now?
    9. 10. <ul><li>“ Sir Tim Berners-Lee, the inventor of the world wide web, will help the British government to make its data more easily available online … I have asked Sir Tim Berners-Lee … to help us drive the opening up of access to Government data in the web” Prime Minister Gordon Brown, 10 th June 2009 </li></ul><ul><li>&quot;What you find if you deal with people in government departments is that they hug their database, hold it really close”. Tim Berners-Lee, 10 th June 2009 </li></ul><ul><li>We shall see … </li></ul>
    10. 12. Officially launched 21 st January 2010
    11. 13. – search for ‘traffic’
    12. 14. Central Office of Information -
    13. 15. BBC Music BETA
    14. 16. <ul><li>Provides access to raw data (Excel spreadsheets, PDF files, and more) </li></ul><ul><li>UK is adhering more closely to Berners- Lee’s Linked Data rules </li></ul>
    15. 17.
    16. 18.
    17. 19. Graphs house prices over time - combines house price data with information from Yahoo! Placemaker, Nestoria and OpenStreetMap
    18. 20. Effect of congestion charge zones on increasing the number of bicycles and reducing the number of cars and taxis – from ITO World /
    19. 22. Postcode Paper - bus timetables, doctors surgeries, allotments
    20. 23. Owls Near You -
    21. 25.
    22. 26. A little bit of the techy stuff
    23. 27. Linked Data is … <ul><li>A way of publishing data on the web that: </li></ul><ul><ul><li>Encourages reuse </li></ul></ul><ul><ul><li>Reduces redundancy </li></ul></ul><ul><ul><li>Maximises inter-connectedness </li></ul></ul><ul><ul><li>Enables network effects </li></ul></ul><ul><li>So how is this achieved? </li></ul>
    24. 28. Presentational tagging – HTML <ul><li><h1>Agilitas Physiotherapy Centre</h1> <p>Welcome to the Agilitas Physiotherapy Centre home page. Do you feel pain? Have you had an injury? Let our staff Lisa Davenport, our secretary Kelly Townsend, and Steve Matthews take care of your body and soul.</p> <h2>Consultation hours</h2> Mon 11am - 7pm<br/> Tue 11am - 7pm<br/> Wed 3pm - 7pm<br/> Thu 11am - 7pm<br/> Fri 11am - 3pm </li></ul><ul><li><p> But note that we do not offer consultation during the weeks of the <a href=&quot;. . .&quot;>State Of Origin</a> games.</p> </li></ul>
    25. 29. Semantic tagging <ul><li><company> </li></ul><ul><li><treatmentOffered>Physiotherapy</treatmentOffered> </li></ul><ul><li><companyName>Agilitas Physiotherapy Centre</companyName> </li></ul><ul><li><staff> </li></ul><ul><li><therapist>Lisa Davenport</therapist> <therapist>Steve Matthews</therapist> </li></ul><ul><li><secretary>Kelly Townsend</secretary> </li></ul><ul><li></staff> </li></ul><ul><li></company> </li></ul>
    26. 30. Tim BL’s Linked Data Design Issues <ul><li>Use URIs as names for things </li></ul><ul><li>Use HTTP URIs so that people can look up those names. </li></ul><ul><li>When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) </li></ul><ul><li>Include links to other URIs so that they can discover more things. </li></ul><ul><li>From </li></ul>
    27. 31. URIs and HTTP <ul><li>A “Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource –RFC 3986 </li></ul><ul><ul><li>A URL is a type of URI </li></ul></ul><ul><ul><li>HTTP URIs can be ‘de-referenced’ </li></ul></ul><ul><li>HTTP URIs are used for “real world” things </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li> </li></ul></ul>
    28. 32. RDF <ul><li>Resource Description Framework </li></ul><ul><ul><li>“ a language for representing information about resources in the World Wide Web” </li></ul></ul><ul><ul><li>“ RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web” </li></ul></ul><ul><li>Describes relations based on triples </li></ul><ul><ul><li>S ubject-object-predicate </li></ul></ul><ul><li> </li></ul>
    29. 33.
    30. 34. <ul><li>Heroes </li></ul><ul><li>has a </li></ul><ul><li>creator </li></ul><ul><li> whose name is </li></ul><ul><li>David Bowie </li></ul>Subject Predicate Object
    31. 35. Linked Data in Use
    32. 36. Publishing Linked Data <ul><li>RDFizers – convert data formats into RDF </li></ul><ul><li>D2R Server – creates linked data from relational databases </li></ul><ul><li>SparqPlug – Extracts linked data from HTML </li></ul><ul><li>… . Many others </li></ul>
    33. 39. D2R server publishes Linked Data view of database and allows clients to query the database via SPARQL
    34. 40. Linked Data Applications <ul><li>Linked Data Browsers – navigate between data sources </li></ul><ul><ul><li>Disco </li></ul></ul><ul><ul><li>Tabulator </li></ul></ul><ul><ul><li>Marbles </li></ul></ul><ul><li>Linked Data Search Engines </li></ul><ul><ul><li>For humans – Falcons, SWSE </li></ul></ul><ul><ul><li>For apps – Swoogle, Sindice </li></ul></ul>
    35. 41. <ul><li>Tracks provenance of data </li></ul><ul><li>Merges data about the same thing from different sources </li></ul>
    36. 42. <ul><li>User can explore the underlying data structures </li></ul><ul><li>Can search for objects, concepts or documents </li></ul>
    37. 43. <ul><li>Provides interface (API) that other linked data apps can use </li></ul><ul><li>Rationale: new linked data apps shouldn’t need to implement their own infrastructure for crawling and indexing web of data </li></ul>
    38. 44.
    39. 45. Some issues <ul><li>To RDF or not to RDF </li></ul><ul><li>Usability </li></ul><ul><li>Sustainability </li></ul><ul><li>Provenance </li></ul><ul><li>Licensing </li></ul><ul><li>Reliability </li></ul>
    40. 52. Sustainability <ul><li>Ed Summers at the Library of Congress created </li></ul><ul><li>Linked Data interface for LOC subject headings </li></ul><ul><li>People started using it </li></ul>
    41. 53. Library of Congress Subject Headings
    42. 55. Data Licensing <ul><li>Uses Amazon Web Services but contravenes their terms and conditions </li></ul>
    43. 56. Provenance <ul><li>OK if data ‘watermarked’ </li></ul><ul><li>But can often be a problem </li></ul><ul><li>VOID can help (apparently!) </li></ul>
    44. 58. <ul><li>Can we convince IT Managers, VC etc. it’s worth it? </li></ul><ul><ul><li>Realistic expectations </li></ul></ul><ul><ul><li>“ ..the people sort of in charge of the kind of data thing knew so little about their data structures” </li></ul></ul><ul><ul><li>“ I’ve had a whole bunch of meetings to get one dataset, been fobbed off, and literally just never get anywhere” Tom Steinberg, Director of MySociety (from Nodalities issue 8) </li></ul></ul>The Business Case
    45. 59. <ul><li>What’s the payoff for O’Reilly, BBC etc of using Linked Data? </li></ul><ul><li>Why didn’t it work the first time? </li></ul><ul><ul><li>What’s different now? </li></ul></ul><ul><ul><li>Need to work out what Linked Data does that other things don’t </li></ul></ul><ul><ul><li>prove a simple tangible benefit </li></ul></ul>The Business Case
    46. 60.
    47. 61. Universities and Colleges in the Giant Global Graph <ul><li>Session at CETIS Conference 2009 </li></ul><ul><li>Case for Linked Data / Semantic Web discussed </li></ul><ul><li>Some cases: </li></ul><ul><ul><li>Freedom of Information </li></ul></ul><ul><ul><li>Improves data quality </li></ul></ul><ul><ul><li>Joining the party </li></ul></ul>
    48. 62.
    49. 63. Conclusion <ul><li>Some interesting recent developments and sense of momentum </li></ul><ul><li>Central Gov’t interested </li></ul><ul><li>… but still much to do if the semantic web and linked data are to really take hold </li></ul>
    50. 64. Questions? <ul><li> </li></ul><ul><li>[email_address] </li></ul>
    51. 65. CC Attribution <ul><li>Some sections of this presentation adapted from: </li></ul><ul><ul><li>An Introduction to Linked Data , by Tom Heath </li></ul></ul><ul><ul><li>The Semantic Web – An Introduction by Owen Stephens </li></ul></ul><ul><ul><li>Using Linked Data as a Learning Resource Recommendation System by Chris Clarke </li></ul></ul><ul><li>This presentation available under creative commons Noncommercial-Share Alike </li></ul>