Your SlideShare is downloading. ×
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
FRS Linked Open Data Concept v1.3 20101130
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

FRS Linked Open Data Concept v1.3 20101130

708

Published on

Background: Presentation to Ecoinformatics International Technical Collaboration Partnership …

Background: Presentation to Ecoinformatics International Technical Collaboration Partnership

International Web Meeting - Linked Open Data and Environmental Information

Day 1 – December 6, 2010
Geospatial Topic – Dave Smith

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
708
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. FRS and Linked Open Data Potential –Conceptual Discussion v 1.3November 30, 2010 Dave Smith USEPA/OEI/OIC/IESD/ISSB smith.davidg@epa.gov 202-566-0797Document Change History Revision Date Author Description1.0 11/12/2010 David G. Smith Initial Version1.1 11/24/2010 David G. Smith Minor updates/revisions as followon to 11/23 discussion1.2 11/29/2010 David G. Smith Collaborations, potential pilots, FOAF and other models1.3 11/30/2010 David G. Smith Additional collaborations and detail on facility granularity concept
  • 2. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 Contents Document Change History.......................................................................1 Introduction:............................................................................................2 Concept:...................................................................................................2 Current Situation:.....................................................................................3 Linked Open Data Issues:.........................................................................3 Data Model Issues: .................................................................................7 Linked Open Data Development:.............................................................7 Existing Resources....................................................................................7 Short-Term data needs:...........................................................................7 Potential Pilots.........................................................................................9 Longer-Range, Emergent data needs:....................................................10 Other Ongoing, Related Activities..........................................................11Introduction:The intent of this concept paper is to initially explore some conceptual, blue-sky, no-constraints forpotential improvements to the FRS Linked Open Data approach being published via data.gov, and tostimulate additional ideas and brainstorming. Followon to this will be examination of alternatives,prioritizations and finalization of thoughts toward implementation.Concept:Provide enhancements to FRS Linked Open Data approach to improve analysis, enhance facilityrepresentation, improve robustness of LOD querying and analytics, integrate other existing metadatacapabilities and improve capabilities to support Semantic Web approaches, such as more-informed RDFserialization. 2
  • 3. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010Current Situation:FRS data is currently being published via Data.gov, e.g. RDF button on Data.gov catalog pages (e.g.http://www.data.gov/raw/1030 ) for FRS data. Figure 1: Example of Current FRS RDF Offering (highlighted in red box)The data returned is tied to a data.gov URL, e.g.http://www.data.gov/semantic/data/alpha/1030/dataset-1030.rdf.gzLinked Open Data Issues:Currently, FRS and other datasets published via Data.gov are being serialized as RDF to support semanticweb and linked open data. A basic problem with the Data.gov RDF does not just apply to the FRS RDFdata, it likely applies across the board.Firstly, in terms of access, the data is a gzipped download. Data must be downloaded and unzippedbefore it can be accessed - more ideally, it would be good to see Data.gov serving the data up as aSPARQL endpoint, or as a SESAME repository or other means of serving up a triple store. Thatdownload/unzip paradigm does not lend itself to dynamic mashups. 3
  • 4. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010With regard to the Data.gov RDF, it appears to be a brute-force serialization of data tables into RDF. Itdoesnt really have the semantic depth to support analysis that it could use (See Fig. 1-3). <rdf:Description rdf:about="#entry9985"> <hdatum_desc>NAD83</hdatum_desc> <state_name>NEBRASKA</state_name> <latitude83>40.944623</latitude83> <interest_types>STATE MASTER</interest_types> <city_name>GARLAND</city_name> <create_date>01-MAR-00</create_date> <frs_facility_detail_report_url rdf:resource=" http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility? p_registry_id=110006555085 "/> <congressional_dist_num>01</congressional_dist_num> <pgm_sys_acrnms>NE-IIS</pgm_sys_acrnms> <epa_region_code>07</epa_region_code> <country_name>USA</country_name> <fips_code>31159</fips_code> <huc_code>10200203</huc_code> <collect_desc>ADDRESS MATCHING-HOUSE NUMBER</collect_desc> <primary_name>TERRI KELLER RESIDENCE</primary_name> <rdf:type rdf:resource=" http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#DataEntry "/> <ref_point_desc>ENTRANCE POINT OF A FACILITY OR STATION</ref_point_desc> <postal_code>683609338</postal_code> <registry_id>110006555085</registry_id> <location_address>1976 OLD MILL RD</location_address> <accuracy_value>30</accuracy_value> <update_date>06-AUG-01</update_date> <county_name>SEWARD</county_name> <conveyor>FRS</conveyor> <longitude83>-96.990306</longitude83> <state_code>NE</state_code> <site_type_name>STATIONARY</site_type_name> </rdf:Description> Figure 1: Sample of current Data.gov FRS RDF/XML Representation 4
  • 5. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#hdatum_desc > "NAD83" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#state_name > "NEBRASKA" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#latitude83 > "40.944623" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#interest_types > "STATE MASTER" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#city_name > "GARLAND" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#create_date > "01-MAR-00" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#frs_facility_detail_report_ur l > < http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility? p_registry_id=110006555085 > . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#congressional_dist_num > "01" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#pgm_sys_acrnms > "NE-IIS" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#epa_region_code > "07" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#country_name > "USA" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#fips_code > "31159" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#huc_code > "10200203" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#collect_desc > "ADDRESS MATCHING-HOUSE NUMBER" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#primary_name > "TERRI KELLER RESIDENCE" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < http://data-gov.tw.rpi.edu/2009/data- gov-twc.rdf#DataEntry > . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#ref_point_desc > "ENTRANCE POINT OF A FACILITY OR STATION" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#postal_code > "683609338" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#registry_id > "110006555085" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#location_address > "1976 OLD MILL RD" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#accuracy_value > "30" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#update_date > "06-AUG-01" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#county_name > "SEWARD" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#conveyor > "FRS" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < 5
  • 6. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#longitude83 > "-96.990306" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#state_code > "NE" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#site_type_name > "STATIONARY" . Figure 2: Sample of current Data.gov FRS Representation as TriplesThe current RDF serialization is essentially just a brute force conversion - there is plenty of opportunityto enhance and improve.The properties are things that some EPA users might easily understand, but would others, e.g.huc_code, pgm_sys_acrnms – are these uniquely identifiable and understood, within this dataset?Thinking import reference to EPA data dictionary, perhaps EPA namespace or other means of definingthem more positively is needed. We have a lot of metadata that we can bring into the mix, towardenhancing identifiability, understandability and usability of the RDF data.There isnt really much structure or model, its essentially a flat table. Everything is just treated asalphanumeric data types. No temporal intelligence to dates, et cetera. It doesnt identify registry ID assomething unique or indexable. There are many things that can and should be defined better. There isprobably a semantic analogue to our data model that we can develop as an RDF/OWL/etc analogue andthen map to it.One approach which may make more sense is to go back and look at the relational database model,which can support more richness – essentially, individual tables and their relationships would begenerated as Linked Open Data, and the SPARQL queries would then have the flexibility of current SQLqueries.Regarding the properties, are there in some cases other namespaces that we could/should beleveraging? geo: as one example - our data is, however, NAD83, and geo: assumes WGS84. We couldreproject to WGS84 and provide geo: values to supplement what we have, as one possibility. Similarly,maybe foaf: or other namespaces, which deal with addresses and points of contact. The RDF onlycarries locations, but FRS also has contacts, if we should at some point incorporate those as well.In summary, I think it could stand to be improved from a standpoint of accessibility (SPARQL, et cetera - Ithink Data.gov needs to look at that from a services infrastructure standpoint), and then, improvedusability, by following more of a data model approach, as opposed to this flat mapping, and approacheslike mapping to existing namespaces and following existing models where appropriate, and we shouldbe able to leverage some of our metadata elements, data models and other artifacts toward a betterrepresentation and mapping. 6
  • 7. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010Data Model Issues:Long range, some additional tweaks to FRS data model may be needed in order to enhance datarepresentation and better support Linked Open Data - some of these are described in brief below.Linked Open Data Development:Potential collaboration with • Joshua Lieberman (OGC Geospatial Semantics SWG) • Spatial Ontology Community of Practice • Jim Hendler (RPI), George Thomas (HHS): CIO Council and Data.gov Geospatial Semantics threads • John Harman / Michael Pendleton (LOD, SRS) • Steve Young / Zach Scott / Open Gov Team (LOD) • Talis, pending contract (LOD) • TRI Program (Potential Pilot) • Kevin Kirby (Data Model) • Tom Giffen (Data Model, Business Rules) • Ken Blumberg (Business Rules) • Cindy Dickinson (Standards, Business Rules) • Others (program offices, regions, GISWG)Existing Resources • Leverage Data Modeling work that Kevin Kirby has been working on • Drill into gist.owl and other potential resourcesShort-Term data needs: • Semantic Enhancements / Linked Open Data Improvement of capabilities for supporting Linked Open Data applications – Analysis of data structure toward supporting faceted, dimensional analyses (Figure 1) Development of URI schemes, potentially namespaces, and mans and approaches for allowing 7
  • 8. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 unique identification and linkage Administrative POC Site -level Organizational Legal POC Affiliation Operational POC Ultimate Organizational Parent Lat/Long People Physical USPS Address Municipality Organizational Dimension HUC Code Spatial Temporal Dimension Site Dimension Regulatory Dimension Program IDs Activity NAICS Code SIC Code Figure 3: Potential Facets / Dimensions for Analysis and Semantic Enhancement • Semantic Dimensions: Explore various dimensions of facility: • Spatial – o GML representation of absolute location (lat/long, etc) o Spatial representation framework for facility (building footprints, parcel boundary, others for future) o Facility data modeling granularity and relationships - get a better handle on what the facility "thing" represents, and its relation to other things - for example, a parcel 8
  • 9. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 boundary, containing an industrial complex with manufacturing and storage buildings (differing NAICS, possibly even different companies operating and licensed/permitted), plus associated air stacks, SPCC measures, water outfalls, et cetera. When we pull up "facility" it should ultimately reflect that bigger picture for context, with the component of interest in highlight. • Temporal o Data currency o Temporal aspects to regulation, enforcement, permitting, et cetera – future • Corporate Dimension o Corporate ownership – at facility level and at ultimate corporate parent level • Function - Activity and Use o NAICS/SIC Codes o EPA Regulatory program o EPA Interest Type o Linkages / translation between interest type and other ontologies/vocabularies o Linkages to regulatory programs and other components • Interrelationships of facilities (future) • Individuals o Friend-of-a-friend (FOAF) and other existing RDF constructs • Many other potential enhancementsPotential PilotsA number of potential pilots for mashups can be considered. What may be “low hanging fruit” for OEIbuild upon exploitation of known internal assets, i.e. 9
  • 10. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 • FRS • TRI (Toxic Release Quantities for Given Location) • SRS (Substance)Potentially, as one scenario, one could tie TRI discharges to reaches via OW web services and TRIreported receiving waters, and then tie this to observed impacts downstream.One caveat of using EPA data is that it is known to EPA users, but ideally needs to be more fully fleshed-out to make it discoverable and uniquely identifiable for external users, perhaps via embedded EPAidentifiers (perhaps an epa: namespace or similar means of identifying our assets)Other potential scenarios TBD… OECA targeted enforcement vs. OSHA, or OPP vs. USDA pesticidesapplication data.Longer-Range, Emergent data needs:These are not specific to LOD, but are instead emergent attributes of interest for FRS – LOD approachesmay help inform on how to structure these. • HUC Codes Completion of prepopulating of HUC Codes can support identification of facilities impacting major watersheds, e.g. Chesapeake Bay (OECA need) – Other potential needs: Airsheds • Municipality Toward improving data quality – Physical street address may include ZIP Code for city which is different than actual municipality where site resides – for example, Suburban Drive, State College PA is actually Ferguson Township, PA – and local planning and building code officials and emergency responders who either have or need information on the facility of interest would be different than that of the one listed • Relationship Ability to relate facilities – relating individual components of a larger system of infrastructure, such as relating a gas terminal to a compressor station – changes to one may impact others. Ability to organize information in appropriate fashions, such as relating multiple individual oil platforms with discrete permits to a lease boundary with another level of permitting. • Indian Country More robust identification/validation of facilities which may lie within tribal boundaries – refinement of IND-3 boundaries with other source data, analysis of flows containing either tribal flag (Y/N) and/or tribal identifier (tribe/reservation name) - (collaboration with Elizabeth 10
  • 11. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 Jackson / Ed Liu) • Facility Definition Potential broadening of scope and use of FRS to accomodate grant award locations and other types of locations – 2005 NAPA Report recommendations for consistent agencywide site identification. May be predicated on buildout of other capabilities, such as being able to relate sites.Other Ongoing, Related ActivitiesA number of activities, internal and external, can help to inform on direction and data model for FRSdata collection and publishing activities – some of these are listed below: • Potential EPA Corporate ID Workgroup Collaborate with TRI, TSCA, FRP, RMP, Others who collect corporate parent information, as well as OECA and others who need corporate parent information to support analysis. • White House Corporate ID Workgroup Collaborate with emergent White House Corporate ID workgroup – Beth Noveck / Steve Croley, SEC, Labor and other agencies to align, coordinate and collaborate on corporate identifiers • OpenGov Collaboration with EPA Open Gov initiatives to inform on how best to publish data for external reuse. • National Academy of Public Administration Follow-through on 2005 NAPA Report recommendations • Spatial Ontology Community of Practices (SOCOP) Collaboration on vocabularies, standards and data modeling approaches • Data.Gov Data Architecture Subgroup Collaboration on vocabularies, standards and data modeling approaches • EPA OEI/OIC/IESD Data Standards Branch Collaboration on vocabularies, standards and data modeling approaches • Others…Anticipated Next Steps:TBD, develop ideas for potential pilots, engage on “LOD Cookbook” and approaches for representingand rendering our data as RDF. 11

×