1. Department of Parliamentary Services
Parliamentary Library and Information Service
Linked Data:
thinking big, starting small
VALA
6 February 2014
Peter Neish
@peterneish
2. Department of Parliamentary Services
Parliamentary Library and Information Service
What will be covered
• Background
– What is Linked Data?
– Linked Data in Libraries and
Government
• What we did
– Linked Data Workflow
• What did we get out of it?
3. Department of Parliamentary Services
Parliamentary Library and Information Service
What is Linked Data?
1 October 1988
United
Australia
Party
http://www.w3.org/ns/org#memberOf
Predicate
elected
successorOf
party
Subject
Object
Denis Napthine
Liberal Party
http://parliament.vic.gov.au/members/id/135
hasRole
premier
successorOf
http://dbpedia.org/resource/Liberal_Party_of_Australia
party
formationDate
Ted Baillieu
the triple statement
slightly simplified example
31 August 1945
7. Department of Parliamentary Services
Parliamentary Library and Information Service
Linked Data in Libraries
• OCLC – 1.2 million resources – 80 million triples
• LOC – Subject headings, authority files
• British Library – 2.8 million records, 93 billion triples
• BIBFRAME
• Schema Bib Extend Community Group
• LODLAM
8. Department of Parliamentary Services
Parliamentary Library and Information Service
Linked Data in Parliament and Government
– 6.4 billion triples of open government data
10. Department of Parliamentary Services
Parliamentary Library and Information Service
Project aims
• Is Linked Data useful in a
local context
• Explore the process of using
Linked Data – where do you
start?
• Being able to interrogate our
data in new ways
• Use visualisation to gain new
insights into data
11. Department of Parliamentary Services
Parliamentary Library and Information Service
Databases at Parliament
People and
Organisations
government
agencies
Documents
media releases
parliamentary
debates (Hansard)
newspaper
clippings
Members of
Parliament
Media
parliamentary
papers
video and audio
clips
party policies
12. Department of Parliamentary Services
Parliamentary Library and Information Service
Linked Data Workflow
Preparation
•choose ontology
•investigate similar projects
Clean and
reconcile data
•clean data (cluster, facet)
•named entity extraction
•reconcile with other data
Publish
•output RDF
•store data (files, triple store etc)
13. Department of Parliamentary Services
Parliamentary Library and Information Service
Preparation
• Investigate similar projects
– Don’t reinvent the wheel
– Collaborate
• Choose an ontology (or build your own)
– Linked Data Open Vocabularies (lov.okfn.org)
14. Department of Parliamentary Services
Parliamentary Library and Information Service
Popolo Ontology
popoloproject.com
• developing open government
specifications relating to the legislature
• prioritizes reuse over novelty
• attempts to make it easy to represent
real-world data
• consensus model – open to
contributions (W3C community
group, github)
17. Department of Parliamentary Services
Parliamentary Library and Information Service
Publish
• create RDF (Open Refine can do this
too)
• store data
– separate files
– embedded in html
– Database mapping using D2RQ
– triple store
18. Department of Parliamentary Services
Parliamentary Library and Information Service
What do we get out of it?
• Combined approach
– embedded data in catalogue
– Fuseki Triple Store
• Complex queries using SPARQL:
– what have previous speakers being saying
about the current issues in parliament?
– find all articles about transport that mention
members of the Road Safety Committee
20. Department of Parliamentary Services
Parliamentary Library and Information Service
Links to related articles
21. Department of Parliamentary Services
Parliamentary Library and Information Service
Federal Preferences 2013 Election
22. Department of Parliamentary Services
Parliamentary Library and Information Service
Conclusion
• The process itself is valuable
• Aligning data with standards
(Popolo Ontology)
• Cleaning and reconciling
adds value to data
• Databases linked internally
• Can now provide Linked
Data externally
23. Department of Parliamentary Services
Parliamentary Library and Information Service
Further Information
Linked Data best practise and recipes
• freeyourmetadata.org
• linkeddatabook.com
• euclid-project.eu
@peterneish
github.com/peterneish
Editor's Notes
Modelling data in such a way that computers can understandAs humans we have some understanding of what is meant by party and membership, but computers don’t, they are stupid – need formal definitions of these thingsNeed identifiers – again these are for computers, not for peopleOne database record could have hundreds of triple statementsOnce it is linked in a graph and put on the web lots of interesting things become possible
Finding cheesecake recipesPossible because data is marked up semantically behind the scenesTrivial example, but search is one of the primary drivers of Linked Data technologiesMicrosoft, Google and Yahoo have agreed on a schema – schema.org – cannot be ignored
I used to work at the Royal Botanic Gardens here in Melbourne where we worked really hard with other botanic gardens to link up data across statesProblem was that names would vary across state boundaries – Linked Data was the answer and this underpins the Atlas of Australia which links up data on all living this held in Australian botanic gardens and museums.
Biomedical area has been an early adopter of Linked DataLinking gene sequences, proteins, drugs and clinical trial new discoveries can be made
Libraries have been very active too e.g. OCLC, Library of Congress, British LibraryAlso groups working out the best ways to work with Linked Data and bibliographic recordsBIBFRAME concerned with using Linked Data to describe collections and the entire cataloguing processSchema Bib Extend – concerned with making sure bibliographic information is discoverable by working to get bibliographic information encoded in schema.orgLODLAM – Linked Open Data in Libraries Archives and Museums
Move to Open Government where governments release data they have collected (which has been paid for by the tax payer)Makes governments more accountableOthers can use the data to build applications
Non government organisations have appeared that strive to make governments more accountableOpenAustralia – republishes Hansard from the federal parliamentSunlight Foundation in US – many applications that track activities in CongressOpenNorth in CanadamySociety - UK
How useful is Linked Data in a local context – is that enough to justify investing in the technology?
Databases are grouped into three related groups “People and organisations” and “Digital Resources”Currently on a variety of platforms, mainly DB/Textworks, but also MySQL and KE Texpress
Workflow to implement Linked Data – there are links to best practises at end of the presentation
Natural tendency to think that our institution is unique and has unique requirements – probably not the caseInvestigate, find others doing the same thing and possibly collaborateChoosing an ontology requires decisions to be made about how to describe the things in our database using a standard ontologyOne tool that is useful is a site from the open knowledge foundation that lets you search about 400 well known ontologies
Popolo was aligned with our dataGovernments are complex – there are members, houses of parliament, legislation, acts, bills, speeches – all this needs to be modelled in a standard wayIf we can agree on a standard then we can collaborate on tools as well as the standardsWe have been able to test our data against the ontology and provide feedback on where it falls short.
Open Refine (formerly Google Refine) is an amazing tool for cleaning and reconciling data.Excellent introductory and tutorial videos available on the open refine site http://openrefine.org/In brief, can cluster and facet to find duplicate values or slight misspellings or syntax errors
Reconciling is about taking values stored as strings in your database and linking them to an authoritative source – this could be dbpedia or freebase or the library of congress subject headings.Your term can be matched against the authoritative source and the best match chosen. You now have the identifier that creates a link between your data and the authoritative source
Google refine can do this too.See links at end on best practises and guides for publishing Linked Data.
Fuseki (JENA project) using a TDB data store.How would these queries have been done before? First look up all the previous speakers, then query our database for each of them and combine the resultsNow our system knows which members have had the role of Speaker and we can use this in our query – SPARQL is a query language for querying triple store databases.
Shows the topics of media releases from the last year – colour is for party and the size is proportional to the number of media releases.Can quickly see what the important topics are for the week / month etc
We are using some semantic tools to identify entities in our content and can easily link these to other similar items
With an election coming up in November this year we are keen to explore how we can link our information with that from the electoral commissionWe should be able to link up electorates, demographics, news items, candidates, policies etc.This visualisation was possible because the Australian Electoral commission produces data on elections in a standard format. In this case I was able to use the data to get a better insight into how preferences were being swapped between parties.