Linked Data Overview - structured data on the web for US EPA 20140203


Published on

This presentation provides a Jargon-free overview of Linked Open Data. Linked Data is being used by the US EPA for US Government data publication. The Linked Data approach allows for an increased ability to combine data from multiple sources and decreased costs.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linked Data Overview - structured data on the web for US EPA 20140203

  1. 1. Linked Data: Structured Data on the Web (the jargon-free version) US EPA Linked Data ! Bernadette Hyland, CEO @BernHyland General: @3RoundStones Main +1-877-290-2127
  2. 2. Agenda • Intros ... • What is the need? • Jargon-free overview of Linked Open Data • Trends in data management • Government data publication • EPA is moving towards Linked Data
  3. 3. Demand for environmental data •High demand for improved information platforms to publish, share and visualize integrated data •e.g., chemicals, pollution, air quality, regulated facilities •Goal: Increase data quality & comparability to facilitate access & re-use
  4. 4. Data Sharing & Management Snafu in 3 short acts: feature=player_embedded&v=N2zK3sAtr-4
  5. 5. RDF is a lingua franca for data exchange
  6. 6. • Linked Data is about publishing and consuming data using international data standards • Based on 20+ year old idea • A system of linked information systems
  7. 7. Governments Goals: Governmental transparency and/or improved internal efficiencies (data warehouses)
  8. 8. What is driving us? “We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.” ! -- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People
  9. 9. Global requirements • Comprehensively link legislation & regulations for more effective government • Explain context, source, version & publication date with the data itself • We need global standards for metadata
  10. 10. US EPA publishes lots of CSV files ...
  11. 11. 5 Trillion Daily (2013) 4.8T 4 Trillion Digital Information Produced 35 ZB 3 Trillion 2 Trillion 1.8 ZB 1 Trillion 2012 2020 294B Online Ad Impressions Emails 230M Tweets 5% annual growth in IT spending 40% annual growth in data produced
  12. 12. The United States in 2012 314 million Total population 90 million software end users 55 million users of spreadsheets/ databases 13 million “end user programmers” 3 million professional programmers
  13. 13. “Most programs today are written not by professional software developers, but by people with expertise in other domains working towards goals for which they need computational support.”
  14. 14. Data in the Physical World Readable by people
  15. 15. Readable by motivated people Machine readable
  16. 16. Schemas/Vocabularies Someone else (we don’t know)
  17. 17. Which Copy?
  18. 18. Today’s Data on the Web
  19. 19. Lack of Context
  20. 20. Required Context
  21. 21. Person Michael a Galway Airport first name Hausenblas last name collector collected at collected by my data measurement ... a measurement date 2011-01-01 value units of measure 0 degrees Centigrade
  22. 22. Linked Data on the Web Person Michael a Galway Airport first name Hausenblas last name collector collected at collected by my data measurement ... a measurement date 2011-01-01 or value units of measure 0 degrees Centigrade
  23. 23. Summary of Problems • How can we archive our data in an open manner? • How can we record data context? • How can we record data provenance? • How can we know whether our data is up to date? • How can we share our data with others?
  24. 24. Linked Data is a way to answer these questions
  25. 25. Linked Data • Provides an international standard mechanism to put reusable data on the World Wide Web • Provides a single data model with multiple formats • Provides context, provenance and access • Allows for both human and machine reuse
  26. 26. Linked Data Principles • Name data files and elements with URIs • Use HTTP URIs so people can resolve them on the Web • Provide useful information at those URIs, using the standards (RDF, SPARQL) • Include links to other URIs so people can discover more information.
  27. 27. US EPA Linked Data • Cloud-based Linked Data provision • 2.9M Facilities (FRS) • 100K substances (SRS) • 25 years of toxic pollution reports (TRI) • 3 years of chemical usage reports (CDR) • Considering: Hazardous & non-hazardous waste management (RCRA) & GHG data • FISMA compliant • Millions of pages driven by < 20 Web templates • Launch Spring 2014
  28. 28. From EPA From Wikipedia Open Street Map
  29. 29. HOW IT IS DONE TODAY ...
  30. 30. Audience for EPA Data • Middle school student doing a science project • Concerned citizen worried about local pollution • Environmental Science PhD from EPA • Doctor from NIH writing a research paper
  31. 31. How much mercury did Hanson Permanente Cement release in 2004?
  32. 32. Envirofacts
  33. 33. Finding Hanson Permanente
  34. 34. Finding Mercury Released in 2004
  35. 35. Compliance Report
  36. 36. Potential Audience • XMiddle school student doing a science project • XConcerned citizen worried about local pollution • ✔Environmental Science PhD from EPA • XDoctor from NIH writing a research paper
  37. 37. Linked Data
  38. 38. Finding Hanson Permanente
  39. 39. Finding Mercury Released in 2004 1 2
  40. 40. TRI Report
  41. 41. Data Reuse
  42. 42. Potential Audience • ✔Middle school student doing a science project • ✔Concerned citizen worried about local pollution • ✔Environmental Science PhD from EPA • ✔Doctor from NIH writing a research paper
  43. 43. Increasing the audience of US EPA data consumers
  44. 44. NOAA EPA AirNow EPA Sunwise Wikipedia NLM
  45. 45. Increase re-use by publishing Linked Data • Empower users to create their own views of data to satisfy different applications • Build a community around the data in which users help each other to curate and connect as needed • Skip the supermodel - Leave data in the multiple “best of breed” systems; wrap and expose on the Web of Data
  46. 46.
  47. 47. Credits Population density image (public domain) 2012 population estimate (CC-BY-SA) Scaffidi, C.; Shaw, M.; Myers, Brad, "Estimating the numbers of end users and end user Programmer estimates programmers," Visual Languages and Human-Centric Computing, 2005 IEEE Symposium on , vol., no., pp.207,214, 20-24 Sept. 2005 doi: 10.1109/VLHCC.2005.34 Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, End user programmer quote Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The state of the art in end-user software engineering. ACM Comput. Surv. 43, 3, Article 21 Bag of chips idea Social media icons Open, Linked Data for a Global Community, Tim Berners-Lee, W3C, Gov2.0 Expo, Washington DC, May 25-27 2010. v=1E7lV5_0M38 Courtesy of Corporate and product logos, CAMC credit card image and book covers © their respective owners and used under Fair Use for educational purposes
  48. 48. This work is Copyright © 2011 3 Round Stones Inc. It is licensed under the Creative Commons Attribution 3.0 Unported License
 Full details at: You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.