Screenscraping, Google algorithms, but still not ideal
Context of openess – MPs expenses etc
The Central Office of Information The Central Office of Information (COI) is the Government's centre of excellence for marketing and communications.
data.gov site, which simply provides access to raw data (Excel spreadsheets, PDF files, and more), the UK is adhering closely to Berners- Lee’s Linked Data rules and making data available in formats such as RDF where feasible.
Next few slides from demos at data.gov.uk launch Jan 21 st 2010
This is about open data, not linked data
Principles underpinning the technology
Step back a bit to HTML HTML web of documents doesn’t encourage re-use, reduce redundancy. Are network effects but could be much better.
Note this is a considerable simplification of the detail in danger of misleading. Linked data exploits semantically meaningful tagging to encourage re-use, reduce redundancy etc.
Uses predicate logic. Goes back to Aristotle. Conceptualises things, and the relationships between things
SparqPlug (Coetzee,HeathandMotta,2008) is a servicethatenables the extraction of Linked Data from legacy HTML documents on the Web that do not contain RDF data. The service operates by serialising the HTML DOM as RDF and allowing users to define SPARQL queries that transform elements of this into an RDF graph of their choice
D2R - Using a declarative mapping language, the data publisher defines a mapping between the relational schema of the database and the target RDF vocabulary. Based on the mapping, D2R server publishes a Linked Data view over the database and allows clients to query the database via the SPARQL protocol.
Just as traditional Web browsers allow users to navigate between HTML pages by following hypertext links, Linked Data browsers allow users to navigate between data sources by following links expressed as RDF triples. Linked Data search engines provide keyword-based search services oriented towards human users, and follow a similar interaction paradigm to existing market leaders such as Google and Yahoo.
Dots indicate provenance The colour of the dots doesn’t seem to be of significance
Falcons provide a more detailed interface to the user that exploits the underlying structure of the data. Both provide a summary of the entity the user selects from the results list, alongside additional structured data crawled from the Web and links to related entities. Falcons provides users with the option of searching for objects, concepts and documents, each of which leads to slightly different presentation of results.
Sindice (Oren et al, 2008) and Watson () provide APIs through which Linked Data applications can discover RDF documents on the Web that reference a certain URI or contain certain keywords. The rationale for such services is that each new Linked Data application should not need to implement its own infrastructure for crawling and indexing all parts of the Web of Data of which it might wish to make use. Instead, applications can query these indexes to receive pointers to potentially relevant documents which can then be retrieved and processed by the application itself.
Was covered at CETIS conference in 2009 I’d be interested to get any ideas on this
UKOLN is supported by: Linked Data and the Semantic Web - What are they and should I care? 17th February 2010 MIMAS Discussion Forum University of Manchester, UK Adrian Stevenson
“ The Semantic Web is a web of data , in some ways like a global database” 1
“ first step is putting data on the Web in a form that machines can naturally understand... This creates what I call a Semantic Web - a web of data that can be processed directly or indirectly by machines” 2
2. Tim Berners-Lee, Weaving the Web . Harper, San Francisco. 1999.
“ Sir Tim Berners-Lee, the inventor of the world wide web, will help the British government to make its data more easily available online … I have asked Sir Tim Berners-Lee … to help us drive the opening up of access to Government data in the web” Prime Minister Gordon Brown, 10 th June 2009
"What you find if you deal with people in government departments is that they hug their database, hold it really close”. Tim Berners-Lee, 10 th June 2009
<h1>Agilitas Physiotherapy Centre</h1> <p>Welcome to the Agilitas Physiotherapy Centre home page. Do you feel pain? Have you had an injury? Let our staff Lisa Davenport, our secretary Kelly Townsend, and Steve Matthews take care of your body and soul.</p> <h2>Consultation hours</h2> Mon 11am - 7pm<br/> Tue 11am - 7pm<br/> Wed 3pm - 7pm<br/> Thu 11am - 7pm<br/> Fri 11am - 3pm
<p> But note that we do not offer consultation during the weeks of the <a href=". . .">State Of Origin</a> games.</p>