The Power of Linked Data for
                          Government and Healthcare
                             Information Integration
                                       By Bernadette Hyland
                          CEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG
        This presentation on http://slideshare.net/3roundstones
                      OMG Technical Meeting Special Event, Reston VA
                                     20-Mar-2013
Wednesday, March 20, 13                                                           1
Agenda

                   • Government data publication on the Web
                    • Update on EPA Linked Data Service
                   • Healthcare Delivery Industry s Appetite
                   • Update on W3C Government Linked Data
                          Working Group



Wednesday, March 20, 13                                        2
3 Round Stones produces the leading platform for
       the publication of reusable data on the Web. Our
       commercially supported Open Source platform is
       used by the Fortune 2000 and US Government
       agencies to collect, publish and reuse data, both on
       the public Internet and behind institutional firewalls.




Wednesday, March 20, 13                                         3
http://www.manning.com/dwood/




                          http://3roundstones.com/linking-government-data/




                          http://3roundstones.com/linking-enterprise-data/


Wednesday, March 20, 13                                                      4
US EPA Linked Data
            • Cloud-based Linked Data provision of 3 core
            programs:

                 • 2.9M Facilities
                 • 100K substances
                 • 25 years of toxic pollution reports
           • FISMA compliant
           • 16 Callimachus templates
           • Official launch April 2013
Wednesday, March 20, 13                                     5
US GPO
         • Cloud-based Linked Data provision of persistent
         URLs for US Government documents:

              • 100k+ documents
              • Used by 1,240 Federal Depository Libraries and
              public

        •      In 3rd year of operation

        • Deemed an         Essential service supporting US
        Congress


Wednesday, March 20, 13                                          6
Wednesday, March 20, 13   7
Big Data
                            Simple data
                            Complex data
                            Legacy data




Wednesday, March 20, 13                    8
Wednesday, March 20, 13   9
Open Government Data




Wednesday, March 20, 13                          10
Growing chorus ...
             “We’re moving from managing
             documents to managing discrete pieces of
             open data and content which can be
             tagged, shared, secured, mashed up and
             presented in the way that is most useful
             for the consumer of that information.”
                            -- Report on Digital Government: Building a 21st Century Platform to
                                                              Better Serve the American People




Wednesday, March 20, 13                                                                            11
Wednesday, March 20, 13   12
Governments
           Goals: Governmental transparency and/or improved
                  internal efficiencies (data warehouses)




Wednesday, March 20, 13                                       13
Wednesday, March 20, 13   14
Wednesday, March 20, 13   15
Open data + open standards +
               open platforms

                          Highly scalable computing on the Cloud
                          Open Web Standards

                          5 Star Data (Linked Data), whenever possible
                          Leverage Open   Source tools where practical


Wednesday, March 20, 13                                                  16
Use a non-proprietary format
              •   Open Web data exchange formats
                  •   RDF instead of CSV
              •   Benefits

                  • Accessibility, Interoperability      & Re-use
              •   Reduces the risks of
                  •   “Super model” data warehouse approach
                  •   Budget & schedule over runs
                  •   Confidential info leakage

Wednesday, March 20, 13                                             17
Wednesday, March 20, 13   18
Universal Identifiers
                •         It’s the foundation of the
                          Web

                •         Others can reference things

                •         Two references with the
                          same URI are the same
                          thing

                •         Quick, easy and scaleable

                •         People keep coming back
                          for more!!


Wednesday, March 20, 13                                 19
Wednesday, March 20, 13   20
HELPING DEFINE THE PROCESS



           Identify       Model   Name   Describe   Convert   Publish




Wednesday, March 20, 13                                                 21
HELPING DEFINE THE PROCESS



           Identify       Model   Name    Describe   Convert   Publish




                                         Maintain




Wednesday, March 20, 13                                                  21
Wednesday, March 20, 13   22
A Path to Success
          •   Start with the basics
              •   Well curated datasets with relevant data
          •   Integrate related datasets (e.g., EPA chemical
              substances, toxic releases & facilities)
          •   Reach out to developers early
          •   Emphasize the internal agency benefit
          •   Address data quality ...
              •   Multiple approaches including crowed sourcing


Wednesday, March 20, 13                                           23
Social responsibility of
                   government publishers
                          •   Must specify a license for use

                          •   Publish frequency of data updates

                          •   Ensure data is accurate as possible

                          •   Recognize responsibility to maintain data

                          •   Document & follow a persistence strategy

                          •   Respond to reports of problematic data



Wednesday, March 20, 13                                                   24
Callimachus
                          http://callimachusproject.org
                            http://3roundstones.com



Wednesday, March 20, 13                                   25
CONTENT                          LINKED DATA
      MANAGEMENT                         MANAGEMENT
         SYSTEM                             SYSTEM


                            DATA




                                                   TEXT
                          UNSTRUCTURED




                                   Callimachus

                                                 STRUCTURED
                                                    DATA
                              TEXT




Wednesday, March 20, 13                                       26
Wednesday, March 20, 13   27
Guidance for developers




Wednesday, March 20, 13                       28
Wednesday, March 20, 13   29
From EPA
                          From Wikipedia




                          Open Street Map

Wednesday, March 20, 13                                30
Wednesday, March 20, 13   31
We’ve Seen This Before




Wednesday, March 20, 13                  32
Wednesday, March 20, 13   33
User




                                    US EPA                 US EPA
                           NOAA
                                    AirNow                SunWise




                                              National
                          DBpedia            Library of
                                             Medicine



Wednesday, March 20, 13                                             34
How much mercury did
     Elisa’s local cement plant release
                   in 2004?



Wednesday, March 20, 13                   35
Linked Data
                           Approach


Wednesday, March 20, 13                 36
Wednesday, March 20, 13   37
Finding Hanson Permanente




Wednesday, March 20, 13                          38
Finding Mercury Released in 2004
                          1




                              2




Wednesday, March 20, 13           39
TRI Report




Wednesday, March 20, 13                40
Data Reuse




Wednesday, March 20, 13                41
Potential Audience
 ✔
 • Middle school student doing a science project

 ✔
 • Concerned citizen worried about local pollution

 ✔Environmental Science PhD from EPA
 •

 ✔
 • Doctor from NIH writing a research paper




Wednesday, March 20, 13                              42
Active PURLs for Clinical Study Aggregation
                                                                        David Wood1 and Tom Plasterer2
                                                            1   david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com

  The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.

  The solution: Gather, convert, aggregate and format for display
         3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus
      Project, an Open Source management system for Linked Data.
         Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enable
      PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs.
         Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company's
      network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is
      dynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.
      Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable
      versions of the data are also available.
  How semantic technologies help
         Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.
      Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed
      enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing
      researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.

  User experience                                                                                                                          Challenges
                                                      HTTP-accessible endpoints capable of returning XML or textual content                     Distributed queries have many known
      1       Users resolve a URL that                                                                                                      limitations, such as the introduction of
          provides a unique identifier for                                                                                                   multiple single points of failure in any
          a clinical study, drug, chemical                                                                                                  given PURL resolution. HTTP timeouts,
          or other concept managed by                                                                                                       auth/auth errors or other network failures
          this system. The user may                                                                                                         can slow or stop a pipeline from returning
          be presented with the URL on                                                                                                      correctly.
          HTML pages, search it via full-                                                                                                       Similarly, distributed queries can result
          text techniques or discover it                                                                                                    in variant query-time performance due to
          via semantic search.                                                                                                              complex network and endpoint perform-
                                                  Multiple targets queried
                                                  independently                                                                             ance variances.
                                                                                                       Convert XML or textual results to
      2       Users are presented with a                                                               RDF                                      Proactive caching and cache manage-
          dynamically generated Web                                                                                                         meant strategies can improve runtime
          page representing aggregated        1                                                                                             performance and protect end users from
          clinical study information. Users       User resolves a
                                                  single URI to an                                      Render RDF to HTML via template
                                                                                                                                            the limitations inherent in a distributed
          are isolated from the complex           Active PURL                                                                               query architecture. Caching of
          and distributed information                                                                                                       intermediate results from endpoints has
          environment.                                                                                                                      not yet been implemented.
  References                                                                                                                               Next steps
Wednesday, MarchProject,
     1. Callimachus 20, 13                                                                                                                    We intend to continue to address          43
Wednesday, March 20, 13   44
Wednesday, March 20, 13   45
Wednesday, March 20, 13   46
http://slideshare.com/3roundstones

                  Twitter: @BernHyland
           Email. bhyland@3roundstones.com

                      Thank you for participating!!


Wednesday, March 20, 13                               47
Credits


                                      Gartner: “Innovation Insight: Linked Data Drives Innovation Through Information-
                   David Newman
                                      Sharing Network Effects” Published: 15 December 2011

                                      Linking Government Data, Springer (2011)
                  David Wood, ed.
                                      http://3roundstones.com/linking-government-data/

                                      Digital Government Strategy: Building a 21st Century Platform to Better Serve the
                                      American People,
                US Executive Branch
                                      http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-
                                      government.html


          W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook




          All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license




Wednesday, March 20, 13                                                                                                   48
This work is Copyright © 2011-2012 3 Round Stones Inc.
                          It is licensed under the Creative Commons Attribution 3.0 Unported License
                          Full details at: http://creativecommons.org/licenses/by/3.0/

                          You are free:

                                  to Share — to copy, distribute and transmit the work



                                  to Remix — to adapt the work



                          Under the following conditions:
                                  Attribution. You must attribute the work in the manner specified by the
                                  author or licensor (but not in any way that suggests that they endorse
                                  you or your use of the work).

                                  Share Alike. If you alter, transform, or build upon this work, you may
                                  distribute the resulting work only under the same or similar license to this
                                  one.




Wednesday, March 20, 13                                                                                          49

The Power of Linked Data for Government & Healthcare Information Integration

  • 1.
    The Power ofLinked Data for Government and Healthcare Information Integration By Bernadette Hyland CEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG This presentation on http://slideshare.net/3roundstones OMG Technical Meeting Special Event, Reston VA 20-Mar-2013 Wednesday, March 20, 13 1
  • 2.
    Agenda • Government data publication on the Web • Update on EPA Linked Data Service • Healthcare Delivery Industry s Appetite • Update on W3C Government Linked Data Working Group Wednesday, March 20, 13 2
  • 3.
    3 Round Stonesproduces the leading platform for the publication of reusable data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls. Wednesday, March 20, 13 3
  • 4.
    http://www.manning.com/dwood/ http://3roundstones.com/linking-government-data/ http://3roundstones.com/linking-enterprise-data/ Wednesday, March 20, 13 4
  • 5.
    US EPA LinkedData • Cloud-based Linked Data provision of 3 core programs: • 2.9M Facilities • 100K substances • 25 years of toxic pollution reports • FISMA compliant • 16 Callimachus templates • Official launch April 2013 Wednesday, March 20, 13 5
  • 6.
    US GPO • Cloud-based Linked Data provision of persistent URLs for US Government documents: • 100k+ documents • Used by 1,240 Federal Depository Libraries and public • In 3rd year of operation • Deemed an Essential service supporting US Congress Wednesday, March 20, 13 6
  • 7.
  • 8.
    Big Data Simple data Complex data Legacy data Wednesday, March 20, 13 8
  • 9.
  • 10.
  • 11.
    Growing chorus ... “We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” -- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People Wednesday, March 20, 13 11
  • 12.
  • 13.
    Governments Goals: Governmental transparency and/or improved internal efficiencies (data warehouses) Wednesday, March 20, 13 13
  • 14.
  • 15.
  • 16.
    Open data +open standards + open platforms Highly scalable computing on the Cloud Open Web Standards 5 Star Data (Linked Data), whenever possible Leverage Open Source tools where practical Wednesday, March 20, 13 16
  • 17.
    Use a non-proprietaryformat • Open Web data exchange formats • RDF instead of CSV • Benefits • Accessibility, Interoperability & Re-use • Reduces the risks of • “Super model” data warehouse approach • Budget & schedule over runs • Confidential info leakage Wednesday, March 20, 13 17
  • 18.
  • 19.
    Universal Identifiers • It’s the foundation of the Web • Others can reference things • Two references with the same URI are the same thing • Quick, easy and scaleable • People keep coming back for more!! Wednesday, March 20, 13 19
  • 20.
  • 21.
    HELPING DEFINE THEPROCESS Identify Model Name Describe Convert Publish Wednesday, March 20, 13 21
  • 22.
    HELPING DEFINE THEPROCESS Identify Model Name Describe Convert Publish Maintain Wednesday, March 20, 13 21
  • 23.
  • 24.
    A Path toSuccess • Start with the basics • Well curated datasets with relevant data • Integrate related datasets (e.g., EPA chemical substances, toxic releases & facilities) • Reach out to developers early • Emphasize the internal agency benefit • Address data quality ... • Multiple approaches including crowed sourcing Wednesday, March 20, 13 23
  • 25.
    Social responsibility of government publishers • Must specify a license for use • Publish frequency of data updates • Ensure data is accurate as possible • Recognize responsibility to maintain data • Document & follow a persistence strategy • Respond to reports of problematic data Wednesday, March 20, 13 24
  • 26.
    Callimachus http://callimachusproject.org http://3roundstones.com Wednesday, March 20, 13 25
  • 27.
    CONTENT LINKED DATA MANAGEMENT MANAGEMENT SYSTEM SYSTEM DATA TEXT UNSTRUCTURED Callimachus STRUCTURED DATA TEXT Wednesday, March 20, 13 26
  • 28.
  • 29.
  • 30.
  • 31.
    From EPA From Wikipedia Open Street Map Wednesday, March 20, 13 30
  • 32.
  • 33.
    We’ve Seen ThisBefore Wednesday, March 20, 13 32
  • 34.
  • 35.
    User US EPA US EPA NOAA AirNow SunWise National DBpedia Library of Medicine Wednesday, March 20, 13 34
  • 36.
    How much mercurydid Elisa’s local cement plant release in 2004? Wednesday, March 20, 13 35
  • 37.
    Linked Data Approach Wednesday, March 20, 13 36
  • 38.
  • 39.
  • 40.
    Finding Mercury Releasedin 2004 1 2 Wednesday, March 20, 13 39
  • 41.
  • 42.
  • 43.
    Potential Audience ✔ • Middle school student doing a science project ✔ • Concerned citizen worried about local pollution ✔Environmental Science PhD from EPA • ✔ • Doctor from NIH writing a research paper Wednesday, March 20, 13 42
  • 44.
    Active PURLs forClinical Study Aggregation David Wood1 and Tom Plasterer2 1 david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources. The solution: Gather, convert, aggregate and format for display 3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus Project, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enable PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company's network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is dynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data. Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable versions of the data are also available. How semantic technologies help Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it. Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base. User experience Challenges HTTP-accessible endpoints capable of returning XML or textual content Distributed queries have many known 1 Users resolve a URL that limitations, such as the introduction of provides a unique identifier for multiple single points of failure in any a clinical study, drug, chemical given PURL resolution. HTTP timeouts, or other concept managed by auth/auth errors or other network failures this system. The user may can slow or stop a pipeline from returning be presented with the URL on correctly. HTML pages, search it via full- Similarly, distributed queries can result text techniques or discover it in variant query-time performance due to via semantic search. complex network and endpoint perform- Multiple targets queried independently ance variances. Convert XML or textual results to 2 Users are presented with a RDF Proactive caching and cache manage- dynamically generated Web meant strategies can improve runtime page representing aggregated 1 performance and protect end users from clinical study information. Users User resolves a single URI to an Render RDF to HTML via template the limitations inherent in a distributed are isolated from the complex Active PURL query architecture. Caching of and distributed information intermediate results from endpoints has environment. not yet been implemented. References Next steps Wednesday, MarchProject, 1. Callimachus 20, 13 We intend to continue to address 43
  • 45.
  • 46.
  • 47.
  • 48.
    http://slideshare.com/3roundstones Twitter: @BernHyland Email. bhyland@3roundstones.com Thank you for participating!! Wednesday, March 20, 13 47
  • 49.
    Credits Gartner: “Innovation Insight: Linked Data Drives Innovation Through Information- David Newman Sharing Network Effects” Published: 15 December 2011 Linking Government Data, Springer (2011) David Wood, ed. http://3roundstones.com/linking-government-data/ Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, US Executive Branch http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital- government.html W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license Wednesday, March 20, 13 48
  • 50.
    This work isCopyright © 2011-2012 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported License Full details at: http://creativecommons.org/licenses/by/3.0/ You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. Wednesday, March 20, 13 49