Cleansing land ownership data, an
FME use case
David Eagle
Principal Consultant
david.eagle@1spatial.com
@david_eagle
Agenda
•
•
•
•

1Spatial
Asset management, the case for good data
The data challenge
Technical solution
– Regex and Lists

• Benefits
• Founded in 1969
– Part of the Cambridge Tech Cluster

• Headquarters in Cambridge, UK
– International offices in Australia, Ireland,
Belgium & France
• A group of innovative, market leading technology
companies:
Our Customers
•A specialist provider to National Mapping and Charting
Agencies, Government, Defence and Utilities
Our Partners
Customer Case Study
• Fisher German
– Multi-discipline firm of Chartered Surveyors, Town
Planners, Property Consultants & Specialist Engineers
– Management of:
• 4000km of high pressure oil pipeline
• 2500km fibre network

– Creators of:
• www.linesearchbeforeudig.co.uk a free to use enquiry tool used
by BT, HA, Utilities, Local Gov’t etc
• >45 members with protected assets such as:
Linear Asset Management
• Key role is management and protection of
buried and overhead assets:
– High pressure oil and gas pipelines
– Fibre optics
– Overhead power lines

• Need to ensure access to assets for inspection,
maintenance, upgrade and safety.
• Document, maintain and manage details of
land ownership in the vicinity of assets.
Why is Linear Asset Management Important?


Hunton Hill – Birmingham



Shop - New gas supply
connection



25mm PE connection to a
150mm cast iron main



1hr job!



Found 300mm steel pipe



Drilled anyway



3hrs later…
A close call…
Cross Cut-out showing carrier pipepipe epoxy shell shell repair
section highlighting carrier and and epoxy repair



5mm wall



0.5mm left



Petrol pressure 100 Bar
(1400psi)



Gas main is 100psi
The importance of accurate data
•
•
•

Ownership rights – Gas pipe and pond in Dorset
Incorrect grantor was on the mailing list
Land Registry data saves the day
The systems
Before
•Asset management system – UDB
•Desktop GIS – Spatial data managed and edited
– No synchronisation and some duplication

After
•Database extended to support ‘spatial’
•Single data source served to UDB and desktop
•Addition of web client for view only
•Data editing via WFS-t
Mitigating the risk
•
•
•
•

New project = New desk exercise
Data is purchased from the Land Registry
Known ownership along alignment is collated
Site visits enhance ownership details
–
–
–
–
–

Access points
Difficult access
Tenants
Where is asset exactly?
Dogs!
Data to feed the systems
• At the start of a project it’s necessary to collate a number
of datasets
• Project inputs:
1.
2.
3.
4.
5.
6.

Existing asset data and records
Route Corridor
Land Registry Shape and CSV
On site inspection data
Constraints mapping – Environmental Stewardship, Commonland Register
Other External Datasets
The process
• Manual QA and formatting steps:
1.
2.
3.

Processing of the CSVs into the required schema
Merge with the cleaned and aggregated geospatial data
Import into online management systems

• Manual Process could take several days to process and
involve 2 or 3 people
– Each project can have over 10,500 title deeds & 7,000 grantors

• 300 grantors = 2 days of manual effort
Land Registry - Attributes
•

Fundamental but presents some challenges

•

The deed address details are supplied in a CSV
–
–
–
–

•
•

Title Number – Title reference number
Tenure – Freehold etc
Proprietor – Full name and address
Address – Description of position of address/land

Extra fee to get a ‘slightly’ better structure
It still requires significant manual effort to format
Land Registry - Geometry
•
•
•
•
•

All geometry (each title polygon) is held in an
ESRI Shape file
Many polygons are split into a number of pieces
The Land Registry holds and exports the data
tiled
Features are not aggregated on export
The geometry needs joining to the attributes
before with the PK
What is FME?
•
•
•

Industry standard translation and transformation software
Supports >300 formats
Allows manipulation of many data types:
The case for FME
•
•

FME is often bought for a specific task.
The value comes when it’s used for tasks not previously
considered
– Fisher German’s initial impetus was loading their database

•
•

They turned to FME to clean and conflate their data later
Building a case for FME wasn’t necessary
– Re-use the flexible technology and get a better ROI
Automate and re-use
•
•
•

Automate out the mundane with FME
Avoid hours of Excel copy/paste
Allow staff to focus on the analysis

•
•

First task, process 6 linear asset project files
24,000 Land Registry records processed in 30 seconds with FME
• Previously this would have taken >6 days.
Subsequent steps clean up the geometry and merge the attributes – but
this is a classic FME task!

•
Automate and re-use

•
•
•
•

Lots of Testers/TestFilters
Popular Transformers: http://goo.gl/4rOGf
• Adopt “If, then else” approach.
FME 2013 SP1 more capable with ‘Conditional Mapping’
• http://evangelism.safe.com/fmeevangelist113/
The success of the process relies on two capabilities.
1. Lists
2. Regex
Lists

• A list is a method by which FME permits a single
attribute to hold multiple values
Polygon
Polygon
contains 12
contains 12
trees
trees

tree.Species{0} oak
tree.Species{0} oak
tree.Species{1} ash
tree.Species{1} ash
tree.Species{2} birch
tree.Species{2} birch
tree.Species{3} oak
tree.Species{3} oak
tree.Species{4} birch
tree.Species{4} birch
tree.Species{5} birch
tree.Species{5} birch
Challenge 1: Split the ‘Proprietor’ into ‘Name’ & ‘Address’
“ SOUTH EASTERN POWER NETWORKS PLC Newington House, 99 Southwark Bridge Street, London SN1 1AB ”

•Tester – Pass: If Proprietor Begins with <space>
•AttributeSetter: It’s a Commercial business
•AttributeSplitter: Split on 2 <spaces> and trim whitespace
•

proprietor.Proprietor{0} SOUTH EASTERN POWER NETWORKS PLC

•

proprietor.Proprietor{1} Newington House, 99 Southwark Bridge Street, London SN1 1AB

•AttributeRenamer:
•

Name = SOUTH EASTERN POWER NETWORKS PLC

•

Address = Newington House, 99 Southwark Bridge Street, London SN1 1AB
Challenge 1: Split the ‘Proprietor’ into ‘Name’ & ‘Address’
“JOHN EDMUND SMITH

Big Farm, Preston, Canterbury, Kent ” *

•Tester - Fail: (Proprietor did NOT begin with <space>)
•AttributeSetter: It’s a Residential property
•AttributeSplitter: Split on 4 <spaces> and trim whitespace
•
•

proprietor.Proprietor{0} JOHN EDMUND SMITH
proprietor.Proprietor{1} Big Farm, Preston, Canterbury, Kent

•AttributeRenamer:
•
•

Name = JOHN EDMUND SMITH
Address = Big Farm, Preston, Canterbury, Kent
Challenge 2: Split the Address into appropriate parts
“Newington House, 99 Southwark Bridge Street, London SN1 1AB”

•AttributeSplitter: Split on , and trim whitespace
•
•
•

•

ListElementCounter = 3

•

AttributeRenamer:

•
•
•

•

proprietor.Address{0} Newington House
proprietor.Address{1} 99 Southwark Bridge Street
proprietor.Address{2} London SN1 1AB

Address1 = Newington House
Address2 = 99 Southwark Bridge Street
Town = London SN1 1AB

Depending on data, 3 elements may or may not include a postcode!?
Regex
•

Regular Expressions are a language used for:
•
•
•
•

Pattern matching
String searching
String parsing
String replacement

/FME/

“ W l o v e FM 2 0 1 3 ! ”
e
E
“ FM i s g r e a t ! ”
E

/^FME/

“ W l o v e FM 2 0 1 3 ! ”
e
E
“ FM i s g r e a t ! ”
E

/colou?r/

“ FM i s col ourf u l ! ”
E
“ FM i s col orf u l ! ”
E

^ at start
$ at end

? optional
char.
Challenge 3: Spot the Postcode

• Regex = pattern matching and string manipulation
• http://rubular.com/ - Helps you test!
String:
Regex:

AGI NORTH
([A-Z]*)[ ]([A-Z]*)

String:
Regular Expression:

London SN1 1AB
^(.*S)s+(S{2,4}sS{3})s*$

• Use StringSearcher = Matched output port provides…
•
•

_matched_parts{0} London
_matched_parts{1} SN1 1AB
There were lots more challenges on a similar theme…
Other tasks: Structure and Schema
•
•

Remove duplicate records
Apply common format to names e.g. A A Smith to A.A. Smith

•

Resolve addresses listed twice in the same string
•
•

Common where 2 partners live at same address
“2, High Street, Leicester 2 High Street Leicester”

•

Apply Title Case to names & tidy up use of hyphens

•

Add extra columns and fixed values for target schema

•

Split first names and last name into 2 columns – more Regex!

•

Validate the County names against a list of allowed Counties &
resolve abbreviations - AttributeValueMapper
Summary

• Saves time
•

Before: >1 day of data prep per project

•

After: Using FME, a few seconds to do 80% of the work

• Save money
•

No extra fee to the Land Registry to restructure the data

•

No unnecessary staff time on mundane formatting tasks

• Increased ROI
•

Fisher German already had FME

•

Just consider what else you could adapt FME to do…
Thank you

David Eagle
Principal Consultant
david.eagle@1spatial.com
@david_eagle

Cleansing land ownership data, an FME use case - David Eagle

  • 1.
    Cleansing land ownershipdata, an FME use case David Eagle Principal Consultant david.eagle@1spatial.com @david_eagle
  • 2.
    Agenda • • • • 1Spatial Asset management, thecase for good data The data challenge Technical solution – Regex and Lists • Benefits
  • 3.
    • Founded in1969 – Part of the Cambridge Tech Cluster • Headquarters in Cambridge, UK – International offices in Australia, Ireland, Belgium & France
  • 4.
    • A groupof innovative, market leading technology companies:
  • 5.
    Our Customers •A specialistprovider to National Mapping and Charting Agencies, Government, Defence and Utilities
  • 6.
  • 7.
    Customer Case Study •Fisher German – Multi-discipline firm of Chartered Surveyors, Town Planners, Property Consultants & Specialist Engineers – Management of: • 4000km of high pressure oil pipeline • 2500km fibre network – Creators of: • www.linesearchbeforeudig.co.uk a free to use enquiry tool used by BT, HA, Utilities, Local Gov’t etc • >45 members with protected assets such as:
  • 8.
    Linear Asset Management •Key role is management and protection of buried and overhead assets: – High pressure oil and gas pipelines – Fibre optics – Overhead power lines • Need to ensure access to assets for inspection, maintenance, upgrade and safety. • Document, maintain and manage details of land ownership in the vicinity of assets.
  • 9.
    Why is LinearAsset Management Important?  Hunton Hill – Birmingham  Shop - New gas supply connection  25mm PE connection to a 150mm cast iron main  1hr job!  Found 300mm steel pipe  Drilled anyway  3hrs later…
  • 11.
    A close call… CrossCut-out showing carrier pipepipe epoxy shell shell repair section highlighting carrier and and epoxy repair  5mm wall  0.5mm left  Petrol pressure 100 Bar (1400psi)  Gas main is 100psi
  • 12.
    The importance ofaccurate data • • • Ownership rights – Gas pipe and pond in Dorset Incorrect grantor was on the mailing list Land Registry data saves the day
  • 13.
    The systems Before •Asset managementsystem – UDB •Desktop GIS – Spatial data managed and edited – No synchronisation and some duplication After •Database extended to support ‘spatial’ •Single data source served to UDB and desktop •Addition of web client for view only •Data editing via WFS-t
  • 14.
    Mitigating the risk • • • • Newproject = New desk exercise Data is purchased from the Land Registry Known ownership along alignment is collated Site visits enhance ownership details – – – – – Access points Difficult access Tenants Where is asset exactly? Dogs!
  • 15.
    Data to feedthe systems • At the start of a project it’s necessary to collate a number of datasets • Project inputs: 1. 2. 3. 4. 5. 6. Existing asset data and records Route Corridor Land Registry Shape and CSV On site inspection data Constraints mapping – Environmental Stewardship, Commonland Register Other External Datasets
  • 16.
    The process • ManualQA and formatting steps: 1. 2. 3. Processing of the CSVs into the required schema Merge with the cleaned and aggregated geospatial data Import into online management systems • Manual Process could take several days to process and involve 2 or 3 people – Each project can have over 10,500 title deeds & 7,000 grantors • 300 grantors = 2 days of manual effort
  • 17.
    Land Registry -Attributes • Fundamental but presents some challenges • The deed address details are supplied in a CSV – – – – • • Title Number – Title reference number Tenure – Freehold etc Proprietor – Full name and address Address – Description of position of address/land Extra fee to get a ‘slightly’ better structure It still requires significant manual effort to format
  • 18.
    Land Registry -Geometry • • • • • All geometry (each title polygon) is held in an ESRI Shape file Many polygons are split into a number of pieces The Land Registry holds and exports the data tiled Features are not aggregated on export The geometry needs joining to the attributes before with the PK
  • 19.
    What is FME? • • • Industrystandard translation and transformation software Supports >300 formats Allows manipulation of many data types:
  • 20.
    The case forFME • • FME is often bought for a specific task. The value comes when it’s used for tasks not previously considered – Fisher German’s initial impetus was loading their database • • They turned to FME to clean and conflate their data later Building a case for FME wasn’t necessary – Re-use the flexible technology and get a better ROI
  • 21.
    Automate and re-use • • • Automateout the mundane with FME Avoid hours of Excel copy/paste Allow staff to focus on the analysis • • First task, process 6 linear asset project files 24,000 Land Registry records processed in 30 seconds with FME • Previously this would have taken >6 days. Subsequent steps clean up the geometry and merge the attributes – but this is a classic FME task! •
  • 22.
    Automate and re-use • • • • Lotsof Testers/TestFilters Popular Transformers: http://goo.gl/4rOGf • Adopt “If, then else” approach. FME 2013 SP1 more capable with ‘Conditional Mapping’ • http://evangelism.safe.com/fmeevangelist113/ The success of the process relies on two capabilities. 1. Lists 2. Regex
  • 23.
    Lists • A listis a method by which FME permits a single attribute to hold multiple values Polygon Polygon contains 12 contains 12 trees trees tree.Species{0} oak tree.Species{0} oak tree.Species{1} ash tree.Species{1} ash tree.Species{2} birch tree.Species{2} birch tree.Species{3} oak tree.Species{3} oak tree.Species{4} birch tree.Species{4} birch tree.Species{5} birch tree.Species{5} birch
  • 24.
    Challenge 1: Splitthe ‘Proprietor’ into ‘Name’ & ‘Address’ “ SOUTH EASTERN POWER NETWORKS PLC Newington House, 99 Southwark Bridge Street, London SN1 1AB ” •Tester – Pass: If Proprietor Begins with <space> •AttributeSetter: It’s a Commercial business •AttributeSplitter: Split on 2 <spaces> and trim whitespace • proprietor.Proprietor{0} SOUTH EASTERN POWER NETWORKS PLC • proprietor.Proprietor{1} Newington House, 99 Southwark Bridge Street, London SN1 1AB •AttributeRenamer: • Name = SOUTH EASTERN POWER NETWORKS PLC • Address = Newington House, 99 Southwark Bridge Street, London SN1 1AB
  • 25.
    Challenge 1: Splitthe ‘Proprietor’ into ‘Name’ & ‘Address’ “JOHN EDMUND SMITH Big Farm, Preston, Canterbury, Kent ” * •Tester - Fail: (Proprietor did NOT begin with <space>) •AttributeSetter: It’s a Residential property •AttributeSplitter: Split on 4 <spaces> and trim whitespace • • proprietor.Proprietor{0} JOHN EDMUND SMITH proprietor.Proprietor{1} Big Farm, Preston, Canterbury, Kent •AttributeRenamer: • • Name = JOHN EDMUND SMITH Address = Big Farm, Preston, Canterbury, Kent
  • 26.
    Challenge 2: Splitthe Address into appropriate parts “Newington House, 99 Southwark Bridge Street, London SN1 1AB” •AttributeSplitter: Split on , and trim whitespace • • • • ListElementCounter = 3 • AttributeRenamer: • • • • proprietor.Address{0} Newington House proprietor.Address{1} 99 Southwark Bridge Street proprietor.Address{2} London SN1 1AB Address1 = Newington House Address2 = 99 Southwark Bridge Street Town = London SN1 1AB Depending on data, 3 elements may or may not include a postcode!?
  • 27.
    Regex • Regular Expressions area language used for: • • • • Pattern matching String searching String parsing String replacement /FME/ “ W l o v e FM 2 0 1 3 ! ” e E “ FM i s g r e a t ! ” E /^FME/ “ W l o v e FM 2 0 1 3 ! ” e E “ FM i s g r e a t ! ” E /colou?r/ “ FM i s col ourf u l ! ” E “ FM i s col orf u l ! ” E ^ at start $ at end ? optional char.
  • 28.
    Challenge 3: Spotthe Postcode • Regex = pattern matching and string manipulation • http://rubular.com/ - Helps you test! String: Regex: AGI NORTH ([A-Z]*)[ ]([A-Z]*) String: Regular Expression: London SN1 1AB ^(.*S)s+(S{2,4}sS{3})s*$ • Use StringSearcher = Matched output port provides… • • _matched_parts{0} London _matched_parts{1} SN1 1AB
  • 29.
    There were lotsmore challenges on a similar theme…
  • 30.
    Other tasks: Structureand Schema • • Remove duplicate records Apply common format to names e.g. A A Smith to A.A. Smith • Resolve addresses listed twice in the same string • • Common where 2 partners live at same address “2, High Street, Leicester 2 High Street Leicester” • Apply Title Case to names & tidy up use of hyphens • Add extra columns and fixed values for target schema • Split first names and last name into 2 columns – more Regex! • Validate the County names against a list of allowed Counties & resolve abbreviations - AttributeValueMapper
  • 31.
    Summary • Saves time • Before:>1 day of data prep per project • After: Using FME, a few seconds to do 80% of the work • Save money • No extra fee to the Land Registry to restructure the data • No unnecessary staff time on mundane formatting tasks • Increased ROI • Fisher German already had FME • Just consider what else you could adapt FME to do…
  • 32.
    Thank you David Eagle PrincipalConsultant david.eagle@1spatial.com @david_eagle