Approximately 32 states and more than four million well bores have been drilled in the United States. For its well data, each state agency must deal with an uncoordinated, autonomous data collection process, data model, and distribution methods. This session discusses how Whitestar uses FME to build an extensive set of dataflow models that regularly ingest the raw data, compute locations, verify elevations, perform data validation checks, and standardize the schema nationwide. We'll also highlight how FME is used to output data to a series of open source version 8.4 postgreSQL database structures.
Using FME to Compile, Validate and Maintain a 4 Million Oil and Gas Well Database
1. Using FME to Compile,
Validate, and Maintain a Four 2010:
An FME
Million Oil and Gas Well Odyssey
Database
Robert C. White, Jr.
President, WhiteStar Corporation
2. The Problem
4 Million Wells, 32 States, 7 Provinces
Building a Consistent the Data Structure
Pre-processing and Loading Data
Data Validation Challenges
Update Challenges
Export Challenges
3. The Tools
Documentation from States and Provinces.
FME, particularly FME Workbench.
POSIX Tools
Various Datasets, Custom Programs
PostgreSQL – Open Source Database.
Last but not Least…
4. Varieties of Source Data
File Type Occurrences
CSV Files 7
Excel 5
Access Tables 4
Web Site Scrapes 2
Manual Data Entry 4
dBase Files 5
Card Records, EBCDIC 3
Card Records, ASCII 5
Shape Files 3
Arc Export 1
Open Records Appeal 1
5. Inventing a Data Model
Gathered Available Documentation
Scanned all Data Fields
Grouped by Survey Type, Name, Size
Looked at PPDM.
6. Building the Data Structure
Used PostgreSQL to Input the Structure
FME Readers to Determine Field Lengths
PostgreSQL Field Types Helped Determine
Validity.
Discovered Use of StringSearcher.
E.g., 660FNL 660FWL should be 4 fields.
7. Pre-Processing Data
POSIX Type Tools, i.e. Linux Tools
Convert EBCDIC to ASCII (dd)
Edit Large Files to Delete “Junk” (vi)
Unzip Files (unzip, uncompress)
Untar Files (tar)
Pattern Processing (awk)
8. Data Validation Problems
Dates are Problems…
19000000
02/31/2010
Jan 29-30, 1995
14 Days
Need for some Robust Date Validation
Transformers.
9. Data Validation Challenges
Missing Coordinates
Consistently calculate them to a Land Grid using an
External Program.
County Name Lookup (ValueMapper)
Non Unique API Numbers
Missing API Numbers
Check for “Reasonable” Values.
Check if You’re in the Geography you Expect.
12. Updating the Data
Update with Minimal Intervention
Save the previous database.
Compare New to Old
Set the Update Date.
13. Export Challenges
Software Clients Require Specific Formats
GeoGraphix, Petra, etc.
Used Text writer to meet these challenges.
Sorted on Formation Top Depth
Non-Unique Identifiers.
14. Summary
4 Million Wells, 32 States, 7 Provinces
Building a Consistent Data Structure
Pre-processing and Loading Data
Data Validation Challenges
Update Challenges
Export Challenges
15. Thank You!
Questions?
For more information:
Robert White – rwhite@whitestar.com
WhiteStar Corporation
http://www.whitestar.com
dd.exe – http://www.chryscosome.net/dd
Other POSIX tools
http://www.cs.nmsu.edu/~jeffery/win32