Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Autodiscovery or The long tail of open data

388 views

Published on

Christopher Gutteridge's slides form Connected Data London. Christopher, who is an Open Data Architect at the Univeristy of Southhampton presented why and how people should employ an Open Data strategy at their organisation.

Published in: Technology
  • Be the first to comment

Autodiscovery or The long tail of open data

  1. 1. Autodiscovery or The long tail of open data Christopher Gutteridge University of Southampton & data.ac.uk
  2. 2. Bragsheet Christopher Gutteridge - @cgutteridge • Previously; Lead Developer of EPrints (Open access research repository software). • “Linked Open Data Architect” for University of Southampton. (or whatever we’re currently call doing LOD stuff for an organisation) • Benevolent technical dictator of data.ac.uk (recently deposed) • Webmaster WWW2006 • Assistant Webmaster WWW2007, WWW2009
  3. 3. Image Attributions: • Backgrounds: – http://www.fansshare.com/gallery/photos/14646865/abstract- background-brown-and-blue-circles/ – http://www.pptback.com/old-machine-gears- pptbackground.html • Cliff leap pic: Justin De La Ornellas @ Flickr • Train tracks: duncanh1 @ Flickr • Lego bricks: rawdonfox @ Flickr • Mechano Box: Lady alys @ Wikipedia • Stickle Bricks: Simon Jobling @ Flickr • Free Universal Construction Kit: F.A.T. Lab + Sy-Lab. • Telescope: Brongaeh @ Flickr • Pinata: Peasap @ Flickr • Containers: l2f1 @ Flickr
  4. 4. Why don’t organisations share data? (and what stops them)
  5. 5. Us early adopters have shared data because it’s cool. We were not 100% clear on the benefits but it looks like fun and maybe gains us reputation.
  6. 6. Fear. Uncertainty. Doubt.
  7. 7. Open Data Excuse Bingo Terrorists will use it We'll get spam It's too big It's not very interesting Thieves will use it I don't mind, but someone else might We will get too many enquiries Lawyers want a custom License There's no API Poor Quality There's already a project to... We might want to use it in a paper It's too complicated Data Protection People may misinterpret the data What if we want to sell it later Don’t get depressed! Go here for antidotes: http://is.gd/odbingo
  8. 8. Menu Burger ….. £3.50 Chips ….. £1.50 ≠
  9. 9. Greater than the sum of its parts
  10. 10. Interoperable datasets allow results that are greater than the sum of the parts… 11
  11. 11. bu http://bus.southampton.ac.uk/
  12. 12. 13
  13. 13. 14
  14. 14. 15
  15. 15. 16
  16. 16. http://www.minecraftworldmap.com/worlds/xO 3X4/full#/4469/64/-1806/-3/0/0
  17. 17. data.southampton.ac.uk
  18. 18. Discrete Facts Statistitics
  19. 19. What I want from data • Where am I going? • How can I get there? • Where can I get a coffee enroute?
  20. 20. Why aren’t they using our data?
  21. 21. “If you build it, they will come.”
  22. 22. “If you build it, they will come.”
  23. 23. Value of dataset to audience X Potential audience size X Ease of discovery X Ease of grasping the value of the dataset X Ease of exploiting dataset Probability of open dataset reuse =
  24. 24. Value of dataset to audience X Potential audience size X Ease of discovery X Ease of grasping the value of the dataset X Ease of exploiting dataset X Perceived quality & reliability Probability of open dataset reuse =
  25. 25. …Autodiscoverable and interoperable data can massively increase the potential audience 28
  26. 26. $ ./generate-world Demo --postcode PO381NL --size 250 29
  27. 27. $ ./generate-world Demo --postcode PO381NL --size 250 30
  28. 28. data.ac.uk
  29. 29. • Automatically discovers equipment data from all .ac.uk sites – 2769 websites – 42 providing data – 11,028 records • Automation massively reduces staffing costs • Low effort for institutions- – A third just provide a well-structured spreadsheet! • Not a single-point-of-failure 32 .ac.uk
  30. 30. 33
  31. 31. UK National Equipment Portal 34 http://equipment.data.ac.uk
  32. 32. UNIQUIP Column Heading Required Type No Name At least one of these fields must be completed. Description Related Facility ID No Technique(:cpv) or (:N8) No Location No Contact Name No Contact Telephone At least one of these fields must be completed.Contact URL Contact Email Secondary Contact Name No Secondary Contact Telephone At least one of these fields must be completed with second contact name. Secondary Contact URL Secondary Contact Email ID No Photo No Department No Site Location Yes Building No Service Level No Web Address No 35
  33. 33. 36 .ac.uk
  34. 34. Doin’ it on the cheap 37
  35. 35. Doin’ it on the cheap 38
  36. 36. Ensuring a sustainable service through autodiscovery 39
  37. 37. Sustainability via Autodiscovery • How do we add new datasets? • How are changes made? • How do we know the data is open data?
  38. 38. Sustainability via Autodiscovery • Have a machine readable document describing the institution and any open datasets (with licences) • Place a link to it on the Institutions homepage
  39. 39. /.well-known/openorg http://www.soton.ac.uk/.well-known/openorg or <link rel=“openorg” href=“http://id.southampton.ac.uk/dataset/pr ofile/latest”>
  40. 40. /.well-known/openorg http://www.soton.ac.uk/.well-known/openorg or <link rel=“openorg” href=“http://id.southampton.ac.uk/dataset/pr ofile/latest”>
  41. 41. What is an Organisation Profile Document, 44 A RDF Document that describes the organisation: – General information provided: • Official name, Postal address, Contact phone number,The correct logo, Physical location – Links to the parts of the organisation, • Admissions, Alumni, Freedom of Information, Complaints – A semantic sitemap • Key pages such as jobs, news, events… – Links to the organisation’s discoverable open data sets and APIs • The equipment dataset
  42. 42. What is an Organisation Profile Document, 45
  43. 43. 46
  44. 44. Autodiscovery 47
  45. 45. Autodiscovery 48 • Dataset publicly available on website. • Dataset has to be added manually along with all the institutions details, contacts etc Requires staff time (especially if any dataset changes location)
  46. 46. Autodiscovery 49 • Dataset publicly available on website. • Dataset has to be added manually along with all the institutions details, contacts etc Requires staff time (especially if any dataset changes location) • Organisation has an OPD linking to dataset • The OPD has to be added manually, but the dataset location and institution info is consumed directly from the OPD. Requires less staff time (as any changes made to OPD will get updated)
  47. 47. Autodiscovery 50 • Dataset publicly available on website. • Dataset has to be added manually along with all the institutions details, contacts etc Requires staff time (especially if any dataset changes location) • Organisation has an OPD linking to dataset • The OPD has to be added manually, but the dataset location and institution info is consumed directly from the OPD. Requires less staff time (as any changes made to OPD will get updated) • Link to OPD from organisation’s home page • OPD autodiscovered, so the dataset is automatically added to the service. Requires no staff time (as data is autodiscovered)
  48. 48. Never appeal to a man’s “better nature.” He may not have one. Invoking his “self—interest” gives you more leverage. - Robert Heinlein, “The Notebooks of Lazarus Long”
  49. 49. Status Report – Contributors and data statistics 52
  50. 50. Bronze Silver Gold Data is on the internet and in an acceptable format. ✔ ✔ ✔ Description of dataset is provided by a remotely hosted OPD ✔ ✔ The OPD is discovered via autodiscovery. ✔ The OPD/dataset has a recognised and supported open licence (eg CCO, ODCA or OGL) ✔ 53
  51. 51. Bronze Silver Gold Data is on the internet and in an acceptable format. ✔ ✔ ✔ Description of dataset is provided by a remotely hosted OPD ✔ ✔ The OPD is discovered via autodiscovery. ✔ The OPD/dataset has a recognised and supported open licence (eg CCO, ODCA or OGL) ✔ All items in the dataset are assigned an ID code which is unique within the assigning organisation. ✔ 54
  52. 52. Exploiting profile documents
  53. 53. Exploiting profile documents • We’ve barely begun • Lets try a live demo....
  54. 54. Warning: Metaphor mixing detected
  55. 55. 63 Needless heterogeneity means research doesn’t join up. Aligning datasets every time costs too much. Tools can’t be reused
  56. 56. So what do we do about it?
  57. 57. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work.
  58. 58. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work. The solutions need to be discoverable.
  59. 59. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work. The solutions need to be discoverable. Just putting it on Github is not making a tool discoverable!
  60. 60. Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work. The solutions need to be discoverable. Just putting it on Github is not making a tool discoverable! https://github.com/cgutteridge /
  61. 61. Organisation Datasets Well known formats available for: • Events • Publications • News headlines Nothing in common use for: • Staff Expertise • Programmes of Events • Vacancies • Organisational Structure • Buildings, Rooms • Points of service • Products – Food Menus
  62. 62. RDF or XML Vocabularies don’t solve the problem by themselves. You need: Examples to copy. Tools which consume and produce the format. Online checking tools.
  63. 63. A dataset should at least solve one usecase. Over modelling is fun. Stop it.
  64. 64. • TODO: • OPD DOCUMENTATION
  65. 65. Thank-you. Christopher Gutteridge University of Southampton @cgutteridge cjg@ecs.soton.ac.uk http://opd.data.ac.uk/

×