The Power of Linked Data for Government & Healthcare Information Integration

1,159 views

Published on

Government open data strategies aimed at wider access and re-use by entrepreneurs, publishers and the wider US healthcare delivery industry. Presentation to the OMG Standards Community technical workshop on semantics, held in Reston VA on 20-March 2013. Presentation by Bernadette Hyland, CEO 3 Round Stones, Inc and co-chair W3C Government Linked Data Working Group.

Published in: Health & Medicine
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,159
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
46
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Power of Linked Data for Government & Healthcare Information Integration

  1. 1. The Power of Linked Data for Government and Healthcare Information Integration By Bernadette Hyland CEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG This presentation on http://slideshare.net/3roundstones OMG Technical Meeting Special Event, Reston VA 20-Mar-2013Wednesday, March 20, 13 1
  2. 2. Agenda • Government data publication on the Web • Update on EPA Linked Data Service • Healthcare Delivery Industry s Appetite • Update on W3C Government Linked Data Working GroupWednesday, March 20, 13 2
  3. 3. 3 Round Stones produces the leading platform for the publication of reusable data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls.Wednesday, March 20, 13 3
  4. 4. http://www.manning.com/dwood/ http://3roundstones.com/linking-government-data/ http://3roundstones.com/linking-enterprise-data/Wednesday, March 20, 13 4
  5. 5. US EPA Linked Data • Cloud-based Linked Data provision of 3 core programs: • 2.9M Facilities • 100K substances • 25 years of toxic pollution reports • FISMA compliant • 16 Callimachus templates • Official launch April 2013Wednesday, March 20, 13 5
  6. 6. US GPO • Cloud-based Linked Data provision of persistent URLs for US Government documents: • 100k+ documents • Used by 1,240 Federal Depository Libraries and public • In 3rd year of operation • Deemed an Essential service supporting US CongressWednesday, March 20, 13 6
  7. 7. Wednesday, March 20, 13 7
  8. 8. Big Data Simple data Complex data Legacy dataWednesday, March 20, 13 8
  9. 9. Wednesday, March 20, 13 9
  10. 10. Open Government DataWednesday, March 20, 13 10
  11. 11. Growing chorus ... “We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” -- Report on Digital Government: Building a 21st Century Platform to Better Serve the American PeopleWednesday, March 20, 13 11
  12. 12. Wednesday, March 20, 13 12
  13. 13. Governments Goals: Governmental transparency and/or improved internal efficiencies (data warehouses)Wednesday, March 20, 13 13
  14. 14. Wednesday, March 20, 13 14
  15. 15. Wednesday, March 20, 13 15
  16. 16. Open data + open standards + open platforms Highly scalable computing on the Cloud Open Web Standards 5 Star Data (Linked Data), whenever possible Leverage Open Source tools where practicalWednesday, March 20, 13 16
  17. 17. Use a non-proprietary format • Open Web data exchange formats • RDF instead of CSV • Benefits • Accessibility, Interoperability & Re-use • Reduces the risks of • “Super model” data warehouse approach • Budget & schedule over runs • Confidential info leakageWednesday, March 20, 13 17
  18. 18. Wednesday, March 20, 13 18
  19. 19. Universal Identifiers • It’s the foundation of the Web • Others can reference things • Two references with the same URI are the same thing • Quick, easy and scaleable • People keep coming back for more!!Wednesday, March 20, 13 19
  20. 20. Wednesday, March 20, 13 20
  21. 21. HELPING DEFINE THE PROCESS Identify Model Name Describe Convert PublishWednesday, March 20, 13 21
  22. 22. HELPING DEFINE THE PROCESS Identify Model Name Describe Convert Publish MaintainWednesday, March 20, 13 21
  23. 23. Wednesday, March 20, 13 22
  24. 24. A Path to Success • Start with the basics • Well curated datasets with relevant data • Integrate related datasets (e.g., EPA chemical substances, toxic releases & facilities) • Reach out to developers early • Emphasize the internal agency benefit • Address data quality ... • Multiple approaches including crowed sourcingWednesday, March 20, 13 23
  25. 25. Social responsibility of government publishers • Must specify a license for use • Publish frequency of data updates • Ensure data is accurate as possible • Recognize responsibility to maintain data • Document & follow a persistence strategy • Respond to reports of problematic dataWednesday, March 20, 13 24
  26. 26. Callimachus http://callimachusproject.org http://3roundstones.comWednesday, March 20, 13 25
  27. 27. CONTENT LINKED DATA MANAGEMENT MANAGEMENT SYSTEM SYSTEM DATA TEXT UNSTRUCTURED Callimachus STRUCTURED DATA TEXTWednesday, March 20, 13 26
  28. 28. Wednesday, March 20, 13 27
  29. 29. Guidance for developersWednesday, March 20, 13 28
  30. 30. Wednesday, March 20, 13 29
  31. 31. From EPA From Wikipedia Open Street MapWednesday, March 20, 13 30
  32. 32. Wednesday, March 20, 13 31
  33. 33. We’ve Seen This BeforeWednesday, March 20, 13 32
  34. 34. Wednesday, March 20, 13 33
  35. 35. User US EPA US EPA NOAA AirNow SunWise National DBpedia Library of MedicineWednesday, March 20, 13 34
  36. 36. How much mercury did Elisa’s local cement plant release in 2004?Wednesday, March 20, 13 35
  37. 37. Linked Data ApproachWednesday, March 20, 13 36
  38. 38. Wednesday, March 20, 13 37
  39. 39. Finding Hanson PermanenteWednesday, March 20, 13 38
  40. 40. Finding Mercury Released in 2004 1 2Wednesday, March 20, 13 39
  41. 41. TRI ReportWednesday, March 20, 13 40
  42. 42. Data ReuseWednesday, March 20, 13 41
  43. 43. Potential Audience ✔ • Middle school student doing a science project ✔ • Concerned citizen worried about local pollution ✔Environmental Science PhD from EPA • ✔ • Doctor from NIH writing a research paperWednesday, March 20, 13 42
  44. 44. Active PURLs for Clinical Study Aggregation David Wood1 and Tom Plasterer2 1 david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources. The solution: Gather, convert, aggregate and format for display 3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus Project, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enable PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical companys network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is dynamically transformed into Resource Description Framework (RDF) formats and all sources results then merged into a single, temporary graph of RDF data. Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable versions of the data are also available. How semantic technologies help Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it. Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base. User experience Challenges HTTP-accessible endpoints capable of returning XML or textual content Distributed queries have many known 1 Users resolve a URL that limitations, such as the introduction of provides a unique identifier for multiple single points of failure in any a clinical study, drug, chemical given PURL resolution. HTTP timeouts, or other concept managed by auth/auth errors or other network failures this system. The user may can slow or stop a pipeline from returning be presented with the URL on correctly. HTML pages, search it via full- Similarly, distributed queries can result text techniques or discover it in variant query-time performance due to via semantic search. complex network and endpoint perform- Multiple targets queried independently ance variances. Convert XML or textual results to 2 Users are presented with a RDF Proactive caching and cache manage- dynamically generated Web meant strategies can improve runtime page representing aggregated 1 performance and protect end users from clinical study information. Users User resolves a single URI to an Render RDF to HTML via template the limitations inherent in a distributed are isolated from the complex Active PURL query architecture. Caching of and distributed information intermediate results from endpoints has environment. not yet been implemented. References Next stepsWednesday, MarchProject, 1. Callimachus 20, 13 We intend to continue to address 43
  45. 45. Wednesday, March 20, 13 44
  46. 46. Wednesday, March 20, 13 45
  47. 47. Wednesday, March 20, 13 46
  48. 48. http://slideshare.com/3roundstones Twitter: @BernHyland Email. bhyland@3roundstones.com Thank you for participating!!Wednesday, March 20, 13 47
  49. 49. Credits Gartner: “Innovation Insight: Linked Data Drives Innovation Through Information- David Newman Sharing Network Effects” Published: 15 December 2011 Linking Government Data, Springer (2011) David Wood, ed. http://3roundstones.com/linking-government-data/ Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, US Executive Branch http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital- government.html W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa licenseWednesday, March 20, 13 48
  50. 50. This work is Copyright © 2011-2012 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported License Full details at: http://creativecommons.org/licenses/by/3.0/ You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.Wednesday, March 20, 13 49

×