Linking Open Government Data at Scale

1. ExtendYourReach. Linking Open Government Data at Scale YOW! 2016 Conference Melbourne December 1-2 ~ Brisbane December 5-6 Sydney December 8-9 Bernadette Hyland CEO & co-founder 3 Round Stones, Inc. @BernHyland bhyland@3RoundStones.com

2. @BernHyland

3. @BernHyland

4. Some data is expensive to collect … @BernHyland

5. @BernHyland

6. Data on the Web today @BernHyland

7. Lack of Context credit: http://mhausenblas.info

8. Required Context credit: http://mhausenblas.info

9. Linked data is intentionally for reuse

10. Refers to a set of best practices for publishing and interlinking data for access by both humans and machines. The RDF family of syntaxes (e.g., JSON-LD, N3, Turtle) and HTTP URIs. Linked Data @BernHyland

11. Linked Data can be published by a person or organization behind the ﬁrewall or on the public Web. Linked Data published on the public Web is generally called Linked Open Data. - W3C Linked Data Glossary @BernHyland

12. Something Something else a relationship @BernHyland

13. UQ Universityis a @BernHyland

14. UQ The University of Queensland label Universityis a Group of 8 afﬁliation @BernHyland

15. UQ The University of Queensland label afﬁliation Group of 8 34228 number of undergraduate students 48771 number of students @BernHyland

16. credit: http://json-ld.org/

17. credit: https://callimachusproject.org

18. # G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:afﬁliation <http://dbpedia.org/resource/ Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ? undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students) @BernHyland

26. @BernHyland

27. my data collector collected by measurement Michael ﬁrst name Hausenblaslast name Person a a measurement 2011-01-01 date 0 value units of measure degrees Centigrade ... Galway Airport collected at or Linked Data on the Web @BernHyland

28. “Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data. “Linked Data was part of my initial vision for the Web and is an important part of the Web’s future.The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.” - Tim Berners-Lee

29. “Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data. The Semantic Web morphed when it hit the marketplace

31. Governments & NGOs publishing & consuming Linked Data

32. 07 Nov 2007

33. 10 Nov 2007

34. 28 Feb 2008

35. 31 Mar 2008

36. 18 Sep 2008

37. 05 Mar 2009

38. 27 Mar 2009

39. 14 Jul 2009

40. 22 Sep 2009

41. 22 Sep 2010

44. • Widens EPA’s audience (justiﬁes relevance), for research, environmental justice • More cost-effective than relational backed web portals • Used for scientiﬁc R&D, green chemistry, ++ • Increased transparency https://opendata.epa.gov @BernHyland

45. 7 Steps to Publish Linked Data Source: W3C Best Practices for Publishing Linked Data, see https://www.w3.org/TR/ld-bp/

46. Step #1 - Identify Identify the dataset(s) to be modeled • Request a copy of the logical and physical model of the database(s) • Obtain data extracts (i.e., databases and/or spreadsheets) or create data in a way that can be replicated. @BernHyland

47. Step #2 - Model Data Model data without context to allow for reuse and easier merging of data sets • Traditional DBAs organize data for speciﬁed Web services or applications • In Linked Data, application logic does not drive the data schema, concepts, etc @BernHyland

48. Step #2 - Modeling (cont) Look for real world objects of interest (e.g., people, places, things, locations, etc.) and model them. • Investigate how others are already modeling similar or related data. • Look for duplication & normalize the data • Use common sense to decide whether or not to make link @BernHyland

49. • Connect data from different sources & authoritative vocabularies • Use URIs as names for your objects • Put aside immediate needs of any application • Don’t think about how an application will use your data • Do think about time and how the data will change over time. Step #2 - Modeling (cont) @BernHyland

50. Identiﬁers are at the heart of how things become useful as linked data. We use the same mechanism for connecting data as the Web — the humble HTTP URI The Web is formed by HTTP URIs that are essentially connections linking pieces of information together. Step #3 & 4 Name & Describe @BernHyland

51. 5. Write a script or process to convert the data set repeatedly 6. Publish to the Web and announce it! 7. Maintenance strategy Steps #5, 6 & 7 Convert, Publish & Maintain @BernHyland

52. Take an iterative approach 1. Review of modeling decisions 2. Review vocabularies chosen and developed 3. Modify/update data conversion scripts 4. Do a maintenance walk-through with real use cases 5. Show how to explore data with SPARQL and visualizations 6. Discuss a persistent identiﬁer strategy (think PURLs) @BernHyland

53. @BernHyland

54. @BernHyland

55. Technical DNA of EPA Linked Data Services • Built on Open Source Software • Provides downloadable Linked Open Data (RDF, JSON-LD) • Developer guide includes RESTful API, persistent URLs strategy • Sample apps on GitHub (https://github.com/ USEPA) @BernHyland

56. Power of LOD Combining data sets in a day with Linked Open Data from DBpedia & EPA. Next the EPA wanted more chemical data linked to their data… @BernHyland

57. Specialist knowledge as Linked Open Data @BernHyland

58. PubChem, the world’s largest open molecular database Used by healthcare / life sciences industry worldwide - all Linked Open Data @BernHyland

59. Use of shared vocabularies, including SKOS, RDFS, OWL. Other key vocabularies include Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of data interoperability

60. https://opendata.epa.gov @BernHyland

61. Public Application, Script or automated client Web Browser SPARQL endpointREST APIResource URIs Linked Data management system located at a Tier 1 Cloud Provider (FISMA compliant) RDF Database Registered developer @BernHyland

62. • A worldwide system of linked information systems • Global addressing scheme for data integration that scales to the Web • Nearly immediate data integration to billions of facts Linked Data is a gift … @BernHyland

63. http://LinkedDataDeveloper.com @BernHyland

64. http://www.oreilly.com/data/free/ ﬁles/the-global-impact-of-opendata.pdf @BernHyland

65. How do I get started? https://www.w3.org/TR/ld-bp/

66. https://www.w3.org/2012/ldp/charter Enterprise data interoperability

67. Use your super powers for good! @WhoGiveACrapTP

68. http://w3id.org/people/bernhyland/ presentations Twitter: @BernHyland Email. bhyland@3roundstones.com

Linking Open Government Data at Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Linking Open Government Data at Scale

Similar to Linking Open Government Data at Scale (20)

More from Bernadette Hyland-Wood

More from Bernadette Hyland-Wood (20)

Recently uploaded

Recently uploaded (20)

Linking Open Government Data at Scale