Successfully reported this slideshow.
6/7/2011<br />U.S. Environmental Protection Agency<br />1<br />Conflation, Data Quality and MADness<br />ESRI Developer Me...
Metadata??<br />6/7/2011<br />U.S. Environmental Protection Agency<br />2<br />
FRS Overview<br />Facility Registry System<br />FRS is a data aggregator<br />FRS performs integration, validation and QA ...
FRS improves program facility data validity from 40—95% by selecting best contact and location information from multiple d...
FRS Features<br />Provides a more complete, holistic, cross-media view of key facility information<br /> through verificat...
FRS Features<br />Provides essential support for applications that rely on integrated views of facilities<br />GIS applica...
FRS Scope<br />Major Programs Represented in FRS<br />Air<br />AFS		AQS<br />CAMDBS	EGRID<br />NEI		RBLC<br />RFS (Ethanol...
FRS Data Model<br />High Level Data Model<br />Organization<br />Industrial<br />Classification<br />Affiliation<br />Indi...
FRS Data Pipeline<br />
QA Process<br />
Integration?<br />Air Permit <br />Coordinate<br />Water Permit<br />Coordinate<br />Toxics Permit<br />Coordinate<br />Be...
FRS Processing<br />Q/A Enhancement<br />Data Collection<br />Data Publishing<br /><ul><li>At the publishing stage, extrac...
The FRS geospatial database provides web services, database connections and spatial queries for a wide variety of web mapp...
For Title 40 regulated programs, CDX collects locational and parametric data for the program offices,  and facility data g...
  Several program offices have their own systems that collect and manage locational and parametric data – Envirofacts pull...
FRS contains many data enhancement, lookup and validation services that aid and assist other CDX flows.
FRS receives locational data updates and edits from regional data stewards as needed.
Envirofacts pulls data from the program offices, taking in parametric data and sending  locational data to FRS.  FRS serve...
Upcoming SlideShare
Loading in …5
×

Conflation, Data Quality and MADness (David Smith)

1,188 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Conflation, Data Quality and MADness (David Smith)

  1. 1. 6/7/2011<br />U.S. Environmental Protection Agency<br />1<br />Conflation, Data Quality and MADness<br />ESRI Developer MeetupJune 7th, 2011<br />USEPA Office of Environmental Information<br />David G Smith PE PLS202-566-0797<br />Smith.DavidG@epa.gov<br />Twitter:@DruidSmith<br />
  2. 2. Metadata??<br />6/7/2011<br />U.S. Environmental Protection Agency<br />2<br />
  3. 3. FRS Overview<br />Facility Registry System<br />FRS is a data aggregator<br />FRS performs integration, validation and QA across over 30 federal databases and over 50 state, territory and tribal databases<br />FRS contains information on nearly 2.8 million facilities<br />> 80% of facilities have lat/long information<br />
  4. 4. FRS improves program facility data validity from 40—95% by selecting best contact and location information from multiple data sources <br />Allows EPA, public, academic, and investment communities to evaluate compliance with environmental regulations <br />Provides robust, complete view of facility information, facilitating cross-media analyses:<br />Community-based initiatives<br />Environmental justice analyses<br />NEPA assessments<br />Emergency response<br />Other mission needs (TMDL program, climate change analysis, etc.)<br />6/7/2011<br />U.S. Environmental Protection Agency<br />4<br />What FRS Does<br />
  5. 5. FRS Features<br />Provides a more complete, holistic, cross-media view of key facility information<br /> through verification and <br />data management procedures<br />Incorporates layers of quality control – the FRS record is checked for completeness, consistency, and validity and is owned by FRS<br />Integrates information from program national systems, state master facility records, tribal partners, and other federal agencies<br />Supported by a network of data stewards covering<br />both geographic and <br />programmatic areas of expertise.<br />Fully integrated with the Locational Data and the Integrated Error Correction Process (IECP)<br />5<br />
  6. 6. FRS Features<br />Provides essential support for applications that rely on integrated views of facilities<br />GIS applications (EnviroMapper, MyEnvironment)<br />Public access applications (Envirofacts, Cleanups in My Community (CIMC)<br />Enforcement systems and applications (IDEA, OTIS, ECHO, ICIS)<br />Offers specialized services to applications in need of accurate facility information<br />Emergency Response<br />TRI-ME web<br />DMR Loadings Tool<br />Provides web services, enabling data exchanges with state partners on the Environmental Exchange Network<br />6<br />
  7. 7. FRS Scope<br />Major Programs Represented in FRS<br />Air<br />AFS AQS<br />CAMDBS EGRID<br />NEI RBLC<br />RFS (Ethanol)<br />Water<br />PCS ICIS-NPDES<br />SDWIS CWNS<br />Chemical Releases<br />TRIS RMP<br />TSCA SSTS <br />FRP BRAC<br />Hazardous Waste<br />ACRES CERCLIS<br />RCRAINFO RADINFO<br />Enforcement/Compliance<br />ICIS ECRM<br />NCDB<br />Schools<br />NCES GNIS <br />BIA INDIAN SCHOOL<br />Other<br />LANDFILL<br />http://www.epa.gov/enviro/html/frs_demo/new_crosswalks.html<br />
  8. 8. FRS Data Model<br />High Level Data Model<br />Organization<br />Industrial<br />Classification<br />Affiliation<br />Individual<br />Individual<br />Supplemental<br />Interest<br />Mailing Address<br />Alternative<br />Name<br />Facility/Site<br />Geospatial <br />Environmental Interest<br />
  9. 9. FRS Data Pipeline<br />
  10. 10. QA Process<br />
  11. 11. Integration?<br />Air Permit <br />Coordinate<br />Water Permit<br />Coordinate<br />Toxics Permit<br />Coordinate<br />Best Facility Coordinate?<br />
  12. 12. FRS Processing<br />Q/A Enhancement<br />Data Collection<br />Data Publishing<br /><ul><li>At the publishing stage, extracts of the FRS geospatial database are provided as geospatial downloads
  13. 13. The FRS geospatial database provides web services, database connections and spatial queries for a wide variety of web mapping applications, for example MyEnvironment, Cleanup In My CommunityIDEA/ECHO/OTIS and many others
  14. 14. For Title 40 regulated programs, CDX collects locational and parametric data for the program offices, and facility data goes to FRS.
  15. 15. Several program offices have their own systems that collect and manage locational and parametric data – Envirofacts pulls data from these, and FRS serves as the locational component for Envirofacts
  16. 16. FRS contains many data enhancement, lookup and validation services that aid and assist other CDX flows.
  17. 17. FRS receives locational data updates and edits from regional data stewards as needed.
  18. 18. Envirofacts pulls data from the program offices, taking in parametric data and sending locational data to FRS. FRS serves as the geospatial component of FRS</li></li></ul><li>Locational Data Accuracy and Best Pick<br />FRS utilizes the EPA Lat/Long Data Standard<br />Locational Reference Tables (LRT)<br />Method Accuracy Description (MAD)<br />Best Pick<br />
  19. 19. Locational Reference Table<br />All underlying information from programs is retained, to include locational data <br />For any given facility, there may be multiple individual locations that have been gathered, e.g. an associated air stack location, water outfall location, front gate location, et cetera<br />MAD Codes help us to assess how to handle locational data quality as well as understanding what it represents<br />http://www.epa.gov/enviro/html/locational/lrt_viewer.html<br />
  20. 20. MAD Codes<br />MAD Codes help us to assess how to handle locational data quality<br />As well as understanding what it represents<br />
  21. 21. MAD Codes<br />http://www.exchangenetwork.net/standards/Lat_Long_Standard_08_11_2006_Final.pdf<br />
  22. 22. Match & IntegrateFacility Data<br />Scoring method: to determine if two records are the same facility<br />25 points, parsed street number<br />50 points, matching standardized city name, standardized county name, state and zip<br />Score 100: an environmental interest is created for the source, and associated to the matched FRS record<br />Score 50—100: FRS creates a new record and a new associated environmental interest, the new record is identified as having possible matches<br />Score <30: FRS creates a new record with a new interest<br />
  23. 23. Select the “Best Pick” Information<br /><ul><li>FRS maintains a database table of manual verifications in the LRT. </li></ul>EPA/Regional verifications trump State verifications.<br />Manually verified locations trump all the rest regardless of calculated accuracy or qa checks. <br />In automated processing, Superfund NPL Site locations trump everything<br />Our “normal” process is based on supplied or implied accuracy and QA checks performed (MAD codes).<br />EPA Latitude/Longitude Data Standard (http://www.exchangenetwork.net/standards/Lat_Long_Standard_08_11_2006_Final.pdf)<br />
  24. 24. Business Case<br />Users benefit from high quality integrated locational data for facilities toward enforcement, compliance, analysis, assessment and community impact<br />Being able to assess and manage large amounts of data of varying quality, e.g. VGI<br />
  25. 25. Thank You - URLs<br />

×