Sharing Data on the Web

                                    5-Mar-2013
                        Linked Data Overview for US EPA
                       Office of Pollution Prevention & Toxics
                        By Bernadette Hyland & Luke Ruth

Tuesday, March 5, 13
Agenda
                       • Intros ...
                       • Trends in data management
                        • Government data publication
                       • Update on EPA Linked Data Service
                       • EPA OPPT sharing data on the Web
                        • Review Next steps ...
Tuesday, March 5, 13
3 Round Stones produces the leading platform for
        the publication of reusable data on the Web. Our
        commercially supported Open Source platform is
        used by the Fortune 2000 and US Government
        agencies to collect, publish and reuse data, both on
        the public Internet and behind institutional firewalls.




Tuesday, March 5, 13
US EPA Linked Data
             • Cloud-based Linked Data provision of 3 core
             programs:

                  • 2.9M Facilities
                  • 100K substances
                  • 25 years of toxic pollution reports
            • FISMA compliant
            • 16 Callimachus templates
            • Official launch April 2013
Tuesday, March 5, 13
Tuesday, March 5, 13
Guidance for developers




Tuesday, March 5, 13
US GPO
         • Cloud-based Linked Data provision of persistent
         URLs for US Government documents:

               • 100k+ documents
               • Used by 1,240 Federal Depository Libraries and
               public

         •     In 3rd year of operation

         • Deemed an        Essential service supporting US
         Congress


Tuesday, March 5, 13
http://www.manning.com/dwood/




                       http://3roundstones.com/linking-government-data/




                       http://3roundstones.com/linking-enterprise-data/


Tuesday, March 5, 13
Tuesday, March 5, 13
Trends in government
                         data management


Tuesday, March 5, 13
Tuesday, March 5, 13
Open Government Data




Tuesday, March 5, 13
Growing chorus ...
             “We’re moving from managing
             documents to managing discrete pieces of
             open data and content which can be
             tagged, shared, secured, mashed up and
             presented in the way that is most useful
             for the consumer of that information.”
                         -- Report on Digital Government: Building a 21st Century Platform to
                                                           Better Serve the American People




Tuesday, March 5, 13
Tuesday, March 5, 13
Photo credit: http://www.flickr.com/photos/glennharper/4452247708/
                                                                                   15

Tuesday, March 5, 13
Big Data
                         Simple data
                         Complex data
                         Legacy data




Tuesday, March 5, 13
Governments
           Goals: Governmental transparency and/or improved
                  internal efficiencies (data warehouses)




Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
HELPING DEFINE THE PROCESS



           Identify    Model   Name    Describe   Convert   Publish




                                      Maintain




Tuesday, March 5, 13
Path to Success
          •   Start easy
               •   Well curated datasets with relevant data
          •   Reach out to developers
          •   Get others involved early
          •   Ensure internal benefit
          •   Integrate related datasets
          •   Address data quality ...
               •   Multiple approaches including crowed sourcing


Tuesday, March 5, 13
Put it on the Web
                       •   Upload & share it
                       •   Document what is available
                       •   Document how to use it
                           •   Solve a customer need
                       •   Encourage feedback
                           •   Continuous improvement




Tuesday, March 5, 13
Use a non-proprietary format
               •   Open Web data exchange formats that improve access
                   and re-use
                   •   RDF instead of CSV
               •   Benefits
                   •   Accessibility & Interoperability
               •   Reduce risk of
                   •   Confidential info
                   •   Software viruses

Tuesday, March 5, 13
Open data + open standards +
                open platforms

                       Highly scalable computing & hosting via the
                       Cloud
                       International Data Exchange Standards

                       5 Star Data (Linked Data)
                       Leverage Open    Source tools

Tuesday, March 5, 13
Its the Web of Data

                       •   Universal unidirectional
                           links using URLs

                       •   “Cooperation without
                           coordination

                       •   It’s simple ... nodes and
                           links




Tuesday, March 5, 13
Universal Identifiers
                 •     It’s the foundation of the
                       Web

                 •     Others can reference things

                 •     Two references with the
                       same URI are the same
                       thing

                 •     Quick, easy and scaleable

                 •     People keep coming back
                       for more!!


Tuesday, March 5, 13
Social Responsibility
                       •   Responsibility to maintain published data

                       •   Publish frequency of data updates

                       •   Have a persistence strategy

                       •   Ensure data is accurate as possible

                       •   Respond to reports of problematic data




Tuesday, March 5, 13
Data driven Web apps using Callimachus
     US Legislation +
     enterprise data




                                                 Clinical Trials +
          DBpedia +                             enterprise linked
      enterprise datasets                              data




                                                              29

Tuesday, March 5, 13
Tuesday, March 5, 13
User




                                 US EPA                 US EPA
                        NOAA
                                 AirNow                SunWise




                                           National
                       DBpedia            Library of
                                          Medicine



Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
From EPA
                       From Wikipedia




                       Open Street Map

Tuesday, March 5, 13
Tuesday, March 5, 13
We’ve Seen This Before




Tuesday, March 5, 13
HOW IT IS DONE TODAY ...




Tuesday, March 5, 13
Audience for EPA Data
    • Middle           school student doing a science project
    • Concerned            citizen worried about local pollution
    • Environmental           Science PhD from EPA
    • Doctor           from NIH writing a research paper




Tuesday, March 5, 13
How much mercury did
             Hanson Permanente Cement
                  release in 2004?



Tuesday, March 5, 13
Tuesday, March 5, 13
Web Portals


Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
Finding Hanson Permanente




Tuesday, March 5, 13
Finding Mercury Released in 2004




Tuesday, March 5, 13
Compliance Report




Tuesday, March 5, 13
Potential Audience
 XMiddle school student doing a science project
 •

 XConcerned citizen worried about local pollution
 •

 ✔Environmental Science PhD from EPA
 •

 XDoctor from NIH writing a research paper
 •




Tuesday, March 5, 13
Linked Data
                        Approach


Tuesday, March 5, 13
Finding Hanson Permanente




Tuesday, March 5, 13
Finding Mercury Released in 2004
                       1




                           2




Tuesday, March 5, 13
TRI Report




Tuesday, March 5, 13
Data Reuse




Tuesday, March 5, 13
Potential Audience
 ✔
 • Middle school student doing a science project

 ✔
 • Concerned citizen worried about local pollution

 ✔Environmental Science PhD from EPA
 •

 ✔
 • Doctor from NIH writing a research paper




Tuesday, March 5, 13
Tuesday, March 5, 13
Tuesday, March 5, 13
Credits


                                       Gartner: “Innovation Insight: Linked Data Drives Innovation Through Information-
                       David Newman
                                       Sharing Network Effects” Published: 15 December 2011

                                       Linking Government Data, Springer (2011)
                   David Wood, ed.
                                       http://3roundstones.com/linking-government-data/

                                       Digital Government Strategy: Building a 21st Century Platform to Better Serve the
                                       American People,
                 US Executive Branch
                                       http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-
                                       government.html


           W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook




          All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license




Tuesday, March 5, 13
This work is Copyright © 2011-2012 3 Round Stones Inc.
                       It is licensed under the Creative Commons Attribution 3.0 Unported License
                       Full details at: http://creativecommons.org/licenses/by/3.0/

                       You are free:

                               to Share — to copy, distribute and transmit the work



                               to Remix — to adapt the work



                       Under the following conditions:
                               Attribution. You must attribute the work in the manner specified by the
                               author or licensor (but not in any way that suggests that they endorse
                               you or your use of the work).

                               Share Alike. If you alter, transform, or build upon this work, you may
                               distribute the resulting work only under the same or similar license to this
                               one.




Tuesday, March 5, 13

Sharing Data on the Web

  • 1.
    Sharing Data onthe Web 5-Mar-2013 Linked Data Overview for US EPA Office of Pollution Prevention & Toxics By Bernadette Hyland & Luke Ruth Tuesday, March 5, 13
  • 2.
    Agenda • Intros ... • Trends in data management • Government data publication • Update on EPA Linked Data Service • EPA OPPT sharing data on the Web • Review Next steps ... Tuesday, March 5, 13
  • 3.
    3 Round Stonesproduces the leading platform for the publication of reusable data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls. Tuesday, March 5, 13
  • 4.
    US EPA LinkedData • Cloud-based Linked Data provision of 3 core programs: • 2.9M Facilities • 100K substances • 25 years of toxic pollution reports • FISMA compliant • 16 Callimachus templates • Official launch April 2013 Tuesday, March 5, 13
  • 5.
  • 6.
  • 7.
    US GPO • Cloud-based Linked Data provision of persistent URLs for US Government documents: • 100k+ documents • Used by 1,240 Federal Depository Libraries and public • In 3rd year of operation • Deemed an Essential service supporting US Congress Tuesday, March 5, 13
  • 8.
    http://www.manning.com/dwood/ http://3roundstones.com/linking-government-data/ http://3roundstones.com/linking-enterprise-data/ Tuesday, March 5, 13
  • 9.
  • 10.
    Trends in government data management Tuesday, March 5, 13
  • 11.
  • 12.
  • 13.
    Growing chorus ... “We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” -- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People Tuesday, March 5, 13
  • 14.
  • 15.
  • 16.
    Big Data Simple data Complex data Legacy data Tuesday, March 5, 13
  • 17.
    Governments Goals: Governmental transparency and/or improved internal efficiencies (data warehouses) Tuesday, March 5, 13
  • 18.
  • 19.
  • 20.
  • 21.
    HELPING DEFINE THEPROCESS Identify Model Name Describe Convert Publish Maintain Tuesday, March 5, 13
  • 22.
    Path to Success • Start easy • Well curated datasets with relevant data • Reach out to developers • Get others involved early • Ensure internal benefit • Integrate related datasets • Address data quality ... • Multiple approaches including crowed sourcing Tuesday, March 5, 13
  • 23.
    Put it onthe Web • Upload & share it • Document what is available • Document how to use it • Solve a customer need • Encourage feedback • Continuous improvement Tuesday, March 5, 13
  • 24.
    Use a non-proprietaryformat • Open Web data exchange formats that improve access and re-use • RDF instead of CSV • Benefits • Accessibility & Interoperability • Reduce risk of • Confidential info • Software viruses Tuesday, March 5, 13
  • 25.
    Open data +open standards + open platforms Highly scalable computing & hosting via the Cloud International Data Exchange Standards 5 Star Data (Linked Data) Leverage Open Source tools Tuesday, March 5, 13
  • 26.
    Its the Webof Data • Universal unidirectional links using URLs • “Cooperation without coordination • It’s simple ... nodes and links Tuesday, March 5, 13
  • 27.
    Universal Identifiers • It’s the foundation of the Web • Others can reference things • Two references with the same URI are the same thing • Quick, easy and scaleable • People keep coming back for more!! Tuesday, March 5, 13
  • 28.
    Social Responsibility • Responsibility to maintain published data • Publish frequency of data updates • Have a persistence strategy • Ensure data is accurate as possible • Respond to reports of problematic data Tuesday, March 5, 13
  • 29.
    Data driven Webapps using Callimachus US Legislation + enterprise data Clinical Trials + DBpedia + enterprise linked enterprise datasets data 29 Tuesday, March 5, 13
  • 30.
  • 31.
    User US EPA US EPA NOAA AirNow SunWise National DBpedia Library of Medicine Tuesday, March 5, 13
  • 32.
  • 33.
  • 34.
    From EPA From Wikipedia Open Street Map Tuesday, March 5, 13
  • 35.
  • 36.
    We’ve Seen ThisBefore Tuesday, March 5, 13
  • 37.
    HOW IT ISDONE TODAY ... Tuesday, March 5, 13
  • 38.
    Audience for EPAData • Middle school student doing a science project • Concerned citizen worried about local pollution • Environmental Science PhD from EPA • Doctor from NIH writing a research paper Tuesday, March 5, 13
  • 39.
    How much mercurydid Hanson Permanente Cement release in 2004? Tuesday, March 5, 13
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    Finding Mercury Releasedin 2004 Tuesday, March 5, 13
  • 48.
  • 49.
    Potential Audience XMiddleschool student doing a science project • XConcerned citizen worried about local pollution • ✔Environmental Science PhD from EPA • XDoctor from NIH writing a research paper • Tuesday, March 5, 13
  • 50.
    Linked Data Approach Tuesday, March 5, 13
  • 51.
  • 52.
    Finding Mercury Releasedin 2004 1 2 Tuesday, March 5, 13
  • 53.
  • 54.
  • 55.
    Potential Audience ✔ • Middle school student doing a science project ✔ • Concerned citizen worried about local pollution ✔Environmental Science PhD from EPA • ✔ • Doctor from NIH writing a research paper Tuesday, March 5, 13
  • 56.
  • 57.
  • 58.
    Credits Gartner: “Innovation Insight: Linked Data Drives Innovation Through Information- David Newman Sharing Network Effects” Published: 15 December 2011 Linking Government Data, Springer (2011) David Wood, ed. http://3roundstones.com/linking-government-data/ Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, US Executive Branch http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital- government.html W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license Tuesday, March 5, 13
  • 59.
    This work isCopyright © 2011-2012 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported License Full details at: http://creativecommons.org/licenses/by/3.0/ You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. Tuesday, March 5, 13