SlideShare a Scribd company logo
1 of 34
Download to read offline
Historic Postcode Directories - Progress and
Plans
Postcode GeoReferencing User Group, 5th April
James Crone, EDINA.
Overview
• About EDINA
• Project Background and Context
• Progress To Date
• Plans for coming months
• Outstanding Issues
EDINA
• A JISC funded national data centre based at Edinburgh University Data
Library.
• Provides the UK tertiary education and research community online access to a
library of data, information and research resources.
• The largest section of which (Geo Data Services), comprised of GIS
Specialists and Software Engineers provides access to 2 key online services -
Digimap & UKBORDERS.
• We and our user community have an interest in both contemporary and
historical postcode products.
Background & Context
• What are the historical postcode directories? - datasets which list all unit
postcodes within the UK and assigns to them a national grid reference,
geographic lookups and counts of assigned addresses.
• ESRC has purchased Gridlinked versions of AFPD (2001-2006) for use by the
academic community. This community also has an interest in historic versions
of the AFPD and thus ONS supplied to ESRC historic postcode directories
(1980-2000) for free on the basis that ESRC would QA the historic versions.
• At this point all versions of postcode directories received by ESRC have been
available to users through the EDINA UKBORDERS service since October
2004.
• Steady stream of user downloads. Data for census years most popular but
interestingly significant interest in non-census years.
Deliverables
• Objectives/Deliverables of the QA set out formally in August 2004 MOU
between ESRC & ONS:
• Key Deliverable is a Quality Controlled postcode instance database spanning
1980 to present day. From this ESRC will derive snapshot historical versions of
the postcode directories replacing the versions of unknown quality that are
currently in existence.
• Postcode Instance - defined as the existence of a postcode for a certain period
of time which is unique on both postcode label and date of introduction.
• Postcode Instance = Postcode Label + Date of Introduction
• Instance db will have number of fields – DOI, DOT, most recent easting &
northing and higher geography lookups (1991 ED/OA; 1998 Ward; 2001 OA).
• The ONS Ward History Database will be used to check the veracity of ward
codes within the historic versions of the postcode directories.
Progress to Date
• 4 sequential work phases to complete these objectives:
• I. Data Loading (complete)
• II. Quality Assurance I - Audit (complete)
• III. Quality Assurance II - Verification (in progress)
• IV. Production of Historic Snapshots
• At this point first 2 of these are complete and we are currently engaged in
the verification phase.
• ... Taking each phase in turn
Phase I – Data Loading
• Postcode directories were supplied by ONS from 1980 to present day.
• Origin of data varies:
• Central Postcode Directories: 1980 - 1990 (except 1989)
• AFPDs: 1991 - 1998 (except 1996 & 1997)
• NHSPD: 1996 & 1997
• AFPD (NHS Variant): 1999
• AFPD (Gridlink version): 2000
• + Gridlink versions of AFPD from 2001 to current release.
• With the exception of 1989, a complete set, quite remarkable given that
digital curation & preservation a fairly recent concern.
Phase I – Data Loading
• We took each historic version, loaded it into it`s own
database table (database used is PostgreSQL) &
then merged each years table into a super table
giving all postcodes from all versions of the AFPD.
• Given the differing origins of the year tables and the
tendency for number of attributes to increase over
time, the harmonisation of these snapshots itself
was an "interesting" data management challenge.
For practical purposes fields were distilled down to a
core set.
• The super table was reduced to a table with distinct
postcodes labels (giving the labels of all postcodes
since 1980) and then to the more valuable postcode
instance table.
• Composite merged table - 50,986,078 rows
• Distinct postcode unit table - 2,330,886 rows
• Postcode Instance table - 2,763,839 rows
Phase I – Data Loading
• By itself Date of Introduction only tells us when a postcode was instantised.
In order to be able to examine the lifecycle of each instance we also need to
know if this instance has been terminated or is still live.
• To each instance we attempted to add a Date Of Termination (DOT) by
searching through each of the historic AFPD version tables and determining if
the instance was terminated. Not a trivial task given volumes of data and
number of searches required.
• At the same time each instance also had associated with it latest grid
reference.
• Instance database is therefore quite rich as it holds both the temporal and
spatial history for the instances associated with a postcode.
Phase II – Quality Assurance
(Audit)
• Rationale for Quality Assurance – The quality of the instance database will be
propagated to derived products therefore essential that we have an understanding of
which instances are genuine and which can be regarded as spurious and which may
need to be fixed or weeded out.
• First Step – Analysis of the frequency of instances associated with distinct postcodes.
• Frequency of instances associated with distinct postcodes:
Num of postcode instances : Frequency
1 : 2,379,140
2 : 343,995
3 : 34,986
4 : 4,839
5 : 571
6 : 85
7 : 27
8 : 26
9 : 138
10 : 18
11 : 8
12 : 2
13 : 4
• Straightaway can see that in some cases distinct postcodes have multiple instances
associated with them.
Phase II – Quality Assurance
(Audit)
• Majority of postcodes represented by only a single instance. But significant
number of postcodes have multiple instances associated with them – why?
• Genuine Postcode Recycling
• Spurious Instances due to imputation problems or systematic tablewide
update procedures in past versions (i.e. update for all Scottish 1973
instances in 1980 table).
• Expected vs. Divergent Cases.
Phase II – Quality Assurance
(Audit)
Phase II – Quality Assurance
(Audit)
Phase II – Quality Assurance
(Audit)
• Programmatic tests were designed to flag cases in the Instance database
which diverged from what we expected.
• Do this by taking each postcode in turn and examining the timelines
associated with its instances. Errors grouped into 3 types:
• Type I - in which the DOI = DOT (the instance is instantised & terminated at
the same point in time)
• Type II – (A) in which all instances of the postcode are live or (B) there are
other inconsistencies within the timeline such as blank dates of termination
within a sequence of instances.
• Type III - multiple dates of termination - postcode instantised once but has
multiple dates of termination
Name of these errors is a convenience – not to be confused with Type I/II errors
in Statistics!
3558
347828
206001
4448
0
50000
100000
150000
200000
250000
300000
350000
400000
I II.A II.B III
Spurious Instance Type
Count
Phase II – Quality Assurance
(Audit)
Phase II – Quality Assurance
(Audit)
• As we can see the Type II error cases represent the bulk of the errors so
effort has been directed at identifying different varieties of this type of error.
We will spend a few minutes examining two such examples now.
Phase II – Quality Assurance
(Audit)
• Case A
• 6 instances never with a date of termination - conflict immediately after the
first case.
• Is it valid for there to be so many postcodes which have multiple live
instances?
• Are all of these cases a result of postcode recycling or are they in fact due to
inconsistencies within the dataset itself?
Phase II – Quality Assurance
(Audit)
• Case B
• Again we have 6 instances - this time there is a blank date of termination
within the timeline (which conflicts with the latter 2 instances)
Phase II – Quality Assurance
(Audit)
• Why are these a problem? - when we create the historic cuts we don`t want
any ambiguity.
• need to be sure that all live postcodes are truly live (and should not have
been terminated).
• that where a postcode has multiple instances associated with it, these are
genuine and not a result of problems with how the data was created or
updated.
• that all data is consistent as possible.
• How to reconcile these Spurious cases?
Phase III – QA - Verification
• Type I errors - unclear - we can`t see any logic behind this - to which we ask
is it valid for an instance to introduced and terminated in the same month?
• Type II errors - problem less clear cut as we have already seen - different
species of the same problem causing instances to diverge from the expected
norm.
• Type III errors - multiple dates of termination - As a rule, pick either the
earliest OR latest and apply to all cases
• Mainly Concerned in rest of presentation with dealing with the Type II errors.
• Key Assumption – Instance database holds information about the location of
each instance in space and time. Instances which are similar in both these
respects can be merged.
Phase III – QA - Verification
Phase III – QA - Verification
• Time - According to Royal Mail:
• A postcode is only supposed to be reused after a minimum period of 3 years
has elapsed & residential postcodes are never reused.
• On this basis where we have 2 instances which are instantised within less
than 3 years of one another we can assume that they are referring to the
same thing.
Phase III – QA - Verification
Space (Geography)
• Nearby things tend to be more similar than things that are more distant
apart.
• Instances located close to one another likely reference the same set of
addresses. Instances located more distant apart may represent recycling
events.
• For a postcode instance can see how its instances change in position over
time - are they spatially stationary or more dynamic?
• How quantify this within the instance table? - for each set of instances
associated with a postcode unit compute change in easting & northing
between instances.
Phase III – QA - Verification
• BUT we need to be aware of the spatial accuracy issue. Accuracy with which
grid references have been assigned to postcodes has increased over time as
methodologies have changed with technology advances.
• An overall increase in accuracy of georeferencing over time.
• Instance location change may therefore operate at multiple scales – a local
change due to changes in georeferencing plus a larger change brought about
by recycling.
Phase III – QA - Verification
• Summary statistics for all instances:
• 75% of postcodes with multiple instances record no change in location
whatsoever.
• Of those that do exhibit location change, in 90% of cases this was between
1m and 3km with the remaining cases exhibiting a change of up to 500km.
• Clearly it would be useful if we had a spatial threshold (like the 3 year
temporal threshold) that we could use to decide whether 2 instances should
be merged or kept separate as genuine reuses.
• We argue that using a combination of temporal & spatial measures of
similarity it is possible to discriminate between genuine and spurious
instances.
Phase III – QA - Verification
• Research has only recently began to engage with this problem, progress has
been hindered by the size of the datasets involved and the pain involved in
isolating indicative cases.
• Significant time has been invested in exploring the problem but we are by no
means experts - we need feedback - does this methodology seem
appropriate - are our core assumptions logical?
• Plans are to explore the effects of applying different threshold values - using
known cases of reuse to inform selection of threshold value.
• Pick a threshold value - determine the effects of applying this to the dataset
as a whole in terms of i.e. number of merges that this yields taking samples
to determine the validity of results - are instances inappropriately merged.
Phase III – QA - Verification
Phase III – QA - Verification
• Demonstrate application of these rules by going back to the Spurious cases
we looked at earlier.
•Case A - using our temporal rule of 3 years - these 6 could be compressed to
3 instances. Using our spatial rule (assuming that our upper spatial threshold
exceeds 100m) these could be compressed to a single instance.
Phase III – QA - Verification
•Case B - the inconsistent instance must either be terminated or merged with
another instance. Applying the temporal rule it could be merged with the
following instance. However its location is quite different and so we might decide
that this falls outside our threshold and so instead we might terminate it with
the start date of the following instance.
Phase IV – Create QA Instance DB
At some point in order to move forward we are going to have to proceed,
implement the rules from phase 3 and carry out the updates to the instance
database.
• In doing this we run the risk of going in one of two directions - we can be
either be too inclusive leading to too many instances being merged together
or we cannot be inclusive enough with not enough instances merged
together.
• We intend to be pragmatic about this - we simply cannot have so many
possibly false instances associated with each postcode. Unlikely that we are
going to be able to resolve all cases.
• Once the rules are in place, implementation of them should be fairly straight
forward.
Creation of Historic Snapshots
• With Quality Controlled Instance database in place, yearly historic version of
the postcode directories can then be derived by pulling out all instances that
exist within a particular time slice.
Outstanding Issues
• Reconciling the spurious instances still an ongoing task.
• We would welcome comments/feedback about the
assumptions/methodologies we have chosen to adapt both from ONS and
from other expert users of the AFPD.
• Is there any documentation which might shed light on procedures used to
update the datasets in the past & might explain some of the systematic
inconsistencies we have discovered?
Conclusions
• 1. Historical & Contemporary postcode directory datasets are being accessed
by academic users through UKBORDERS.
• 2. QA process data has been received and loaded - raw instance database
has been created.
• 3. Quality Assurance Audit has been carried out - quality of dataset has been
assessed.
• 4. Significant Progress has been made in reconciling inconsistencies, but work
remains before derived data can be created and exposed to user community.
• 5. Feedback on work to date and input from others users is requested in
order to bring work to a close.
Contact Details
• http://edina.ac.uk/
• james.crone@ed.ac.uk
• Questions?

More Related Content

Similar to Historic Postcode Directories

Final presentation
Final presentationFinal presentation
Final presentationAmogh Hajare
 
Internet of things - 3/4. Solving the problems
Internet of things - 3/4. Solving the problemsInternet of things - 3/4. Solving the problems
Internet of things - 3/4. Solving the problemsSumanth Bhat
 
Update of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agentsUpdate of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agentsVrije Universiteit Amsterdam
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
Time Series Analysis.pptx
Time Series Analysis.pptxTime Series Analysis.pptx
Time Series Analysis.pptxSunny429247
 
Tracking Project WBS
Tracking Project WBSTracking Project WBS
Tracking Project WBSCameron
 
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing ItData Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing Itkanaugust
 
CN Module 5 part 2 2022.pdf
CN Module 5 part 2 2022.pdfCN Module 5 part 2 2022.pdf
CN Module 5 part 2 2022.pdfMayankRaj687571
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisParis Data Engineers !
 
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...Bernardo Najlis
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2Omar Ahmed
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Tutorial(release)
Tutorial(release)Tutorial(release)
Tutorial(release)Oshin Hung
 
Anomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralAnomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralMichael O'Connell
 
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Gloria Re Calegari
 
Network Detroit 9/25/15
Network Detroit 9/25/15Network Detroit 9/25/15
Network Detroit 9/25/15Ellice Engdahl
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 

Similar to Historic Postcode Directories (20)

Final presentation
Final presentationFinal presentation
Final presentation
 
Cleansing land ownership data, an FME use case - David Eagle
Cleansing land ownership data, an FME use case - David EagleCleansing land ownership data, an FME use case - David Eagle
Cleansing land ownership data, an FME use case - David Eagle
 
Internet of things - 3/4. Solving the problems
Internet of things - 3/4. Solving the problemsInternet of things - 3/4. Solving the problems
Internet of things - 3/4. Solving the problems
 
Update of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agentsUpdate of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agents
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Time Series Analysis.pptx
Time Series Analysis.pptxTime Series Analysis.pptx
Time Series Analysis.pptx
 
Tracking Project WBS
Tracking Project WBSTracking Project WBS
Tracking Project WBS
 
TECHNIP
TECHNIPTECHNIP
TECHNIP
 
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing ItData Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing It
 
CN Module 5 part 2 2022.pdf
CN Module 5 part 2 2022.pdfCN Module 5 part 2 2022.pdf
CN Module 5 part 2 2022.pdf
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
 
FME & Governement
FME & GovernementFME & Governement
FME & Governement
 
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Tutorial(release)
Tutorial(release)Tutorial(release)
Tutorial(release)
 
Anomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralAnomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science Central
 
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
 
Network Detroit 9/25/15
Network Detroit 9/25/15Network Detroit 9/25/15
Network Detroit 9/25/15
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Historic Postcode Directories

  • 1. Historic Postcode Directories - Progress and Plans Postcode GeoReferencing User Group, 5th April James Crone, EDINA.
  • 2. Overview • About EDINA • Project Background and Context • Progress To Date • Plans for coming months • Outstanding Issues
  • 3. EDINA • A JISC funded national data centre based at Edinburgh University Data Library. • Provides the UK tertiary education and research community online access to a library of data, information and research resources. • The largest section of which (Geo Data Services), comprised of GIS Specialists and Software Engineers provides access to 2 key online services - Digimap & UKBORDERS. • We and our user community have an interest in both contemporary and historical postcode products.
  • 4. Background & Context • What are the historical postcode directories? - datasets which list all unit postcodes within the UK and assigns to them a national grid reference, geographic lookups and counts of assigned addresses. • ESRC has purchased Gridlinked versions of AFPD (2001-2006) for use by the academic community. This community also has an interest in historic versions of the AFPD and thus ONS supplied to ESRC historic postcode directories (1980-2000) for free on the basis that ESRC would QA the historic versions. • At this point all versions of postcode directories received by ESRC have been available to users through the EDINA UKBORDERS service since October 2004. • Steady stream of user downloads. Data for census years most popular but interestingly significant interest in non-census years.
  • 5. Deliverables • Objectives/Deliverables of the QA set out formally in August 2004 MOU between ESRC & ONS: • Key Deliverable is a Quality Controlled postcode instance database spanning 1980 to present day. From this ESRC will derive snapshot historical versions of the postcode directories replacing the versions of unknown quality that are currently in existence. • Postcode Instance - defined as the existence of a postcode for a certain period of time which is unique on both postcode label and date of introduction. • Postcode Instance = Postcode Label + Date of Introduction • Instance db will have number of fields – DOI, DOT, most recent easting & northing and higher geography lookups (1991 ED/OA; 1998 Ward; 2001 OA). • The ONS Ward History Database will be used to check the veracity of ward codes within the historic versions of the postcode directories.
  • 6. Progress to Date • 4 sequential work phases to complete these objectives: • I. Data Loading (complete) • II. Quality Assurance I - Audit (complete) • III. Quality Assurance II - Verification (in progress) • IV. Production of Historic Snapshots • At this point first 2 of these are complete and we are currently engaged in the verification phase. • ... Taking each phase in turn
  • 7. Phase I – Data Loading • Postcode directories were supplied by ONS from 1980 to present day. • Origin of data varies: • Central Postcode Directories: 1980 - 1990 (except 1989) • AFPDs: 1991 - 1998 (except 1996 & 1997) • NHSPD: 1996 & 1997 • AFPD (NHS Variant): 1999 • AFPD (Gridlink version): 2000 • + Gridlink versions of AFPD from 2001 to current release. • With the exception of 1989, a complete set, quite remarkable given that digital curation & preservation a fairly recent concern.
  • 8. Phase I – Data Loading • We took each historic version, loaded it into it`s own database table (database used is PostgreSQL) & then merged each years table into a super table giving all postcodes from all versions of the AFPD. • Given the differing origins of the year tables and the tendency for number of attributes to increase over time, the harmonisation of these snapshots itself was an "interesting" data management challenge. For practical purposes fields were distilled down to a core set. • The super table was reduced to a table with distinct postcodes labels (giving the labels of all postcodes since 1980) and then to the more valuable postcode instance table. • Composite merged table - 50,986,078 rows • Distinct postcode unit table - 2,330,886 rows • Postcode Instance table - 2,763,839 rows
  • 9. Phase I – Data Loading • By itself Date of Introduction only tells us when a postcode was instantised. In order to be able to examine the lifecycle of each instance we also need to know if this instance has been terminated or is still live. • To each instance we attempted to add a Date Of Termination (DOT) by searching through each of the historic AFPD version tables and determining if the instance was terminated. Not a trivial task given volumes of data and number of searches required. • At the same time each instance also had associated with it latest grid reference. • Instance database is therefore quite rich as it holds both the temporal and spatial history for the instances associated with a postcode.
  • 10. Phase II – Quality Assurance (Audit) • Rationale for Quality Assurance – The quality of the instance database will be propagated to derived products therefore essential that we have an understanding of which instances are genuine and which can be regarded as spurious and which may need to be fixed or weeded out. • First Step – Analysis of the frequency of instances associated with distinct postcodes. • Frequency of instances associated with distinct postcodes: Num of postcode instances : Frequency 1 : 2,379,140 2 : 343,995 3 : 34,986 4 : 4,839 5 : 571 6 : 85 7 : 27 8 : 26 9 : 138 10 : 18 11 : 8 12 : 2 13 : 4 • Straightaway can see that in some cases distinct postcodes have multiple instances associated with them.
  • 11. Phase II – Quality Assurance (Audit) • Majority of postcodes represented by only a single instance. But significant number of postcodes have multiple instances associated with them – why? • Genuine Postcode Recycling • Spurious Instances due to imputation problems or systematic tablewide update procedures in past versions (i.e. update for all Scottish 1973 instances in 1980 table). • Expected vs. Divergent Cases.
  • 12. Phase II – Quality Assurance (Audit)
  • 13. Phase II – Quality Assurance (Audit)
  • 14. Phase II – Quality Assurance (Audit) • Programmatic tests were designed to flag cases in the Instance database which diverged from what we expected. • Do this by taking each postcode in turn and examining the timelines associated with its instances. Errors grouped into 3 types: • Type I - in which the DOI = DOT (the instance is instantised & terminated at the same point in time) • Type II – (A) in which all instances of the postcode are live or (B) there are other inconsistencies within the timeline such as blank dates of termination within a sequence of instances. • Type III - multiple dates of termination - postcode instantised once but has multiple dates of termination Name of these errors is a convenience – not to be confused with Type I/II errors in Statistics!
  • 15. 3558 347828 206001 4448 0 50000 100000 150000 200000 250000 300000 350000 400000 I II.A II.B III Spurious Instance Type Count Phase II – Quality Assurance (Audit)
  • 16. Phase II – Quality Assurance (Audit) • As we can see the Type II error cases represent the bulk of the errors so effort has been directed at identifying different varieties of this type of error. We will spend a few minutes examining two such examples now.
  • 17. Phase II – Quality Assurance (Audit) • Case A • 6 instances never with a date of termination - conflict immediately after the first case. • Is it valid for there to be so many postcodes which have multiple live instances? • Are all of these cases a result of postcode recycling or are they in fact due to inconsistencies within the dataset itself?
  • 18. Phase II – Quality Assurance (Audit) • Case B • Again we have 6 instances - this time there is a blank date of termination within the timeline (which conflicts with the latter 2 instances)
  • 19. Phase II – Quality Assurance (Audit) • Why are these a problem? - when we create the historic cuts we don`t want any ambiguity. • need to be sure that all live postcodes are truly live (and should not have been terminated). • that where a postcode has multiple instances associated with it, these are genuine and not a result of problems with how the data was created or updated. • that all data is consistent as possible. • How to reconcile these Spurious cases?
  • 20. Phase III – QA - Verification • Type I errors - unclear - we can`t see any logic behind this - to which we ask is it valid for an instance to introduced and terminated in the same month? • Type II errors - problem less clear cut as we have already seen - different species of the same problem causing instances to diverge from the expected norm. • Type III errors - multiple dates of termination - As a rule, pick either the earliest OR latest and apply to all cases • Mainly Concerned in rest of presentation with dealing with the Type II errors. • Key Assumption – Instance database holds information about the location of each instance in space and time. Instances which are similar in both these respects can be merged.
  • 21. Phase III – QA - Verification
  • 22. Phase III – QA - Verification • Time - According to Royal Mail: • A postcode is only supposed to be reused after a minimum period of 3 years has elapsed & residential postcodes are never reused. • On this basis where we have 2 instances which are instantised within less than 3 years of one another we can assume that they are referring to the same thing.
  • 23. Phase III – QA - Verification Space (Geography) • Nearby things tend to be more similar than things that are more distant apart. • Instances located close to one another likely reference the same set of addresses. Instances located more distant apart may represent recycling events. • For a postcode instance can see how its instances change in position over time - are they spatially stationary or more dynamic? • How quantify this within the instance table? - for each set of instances associated with a postcode unit compute change in easting & northing between instances.
  • 24. Phase III – QA - Verification • BUT we need to be aware of the spatial accuracy issue. Accuracy with which grid references have been assigned to postcodes has increased over time as methodologies have changed with technology advances. • An overall increase in accuracy of georeferencing over time. • Instance location change may therefore operate at multiple scales – a local change due to changes in georeferencing plus a larger change brought about by recycling.
  • 25. Phase III – QA - Verification • Summary statistics for all instances: • 75% of postcodes with multiple instances record no change in location whatsoever. • Of those that do exhibit location change, in 90% of cases this was between 1m and 3km with the remaining cases exhibiting a change of up to 500km. • Clearly it would be useful if we had a spatial threshold (like the 3 year temporal threshold) that we could use to decide whether 2 instances should be merged or kept separate as genuine reuses. • We argue that using a combination of temporal & spatial measures of similarity it is possible to discriminate between genuine and spurious instances.
  • 26. Phase III – QA - Verification • Research has only recently began to engage with this problem, progress has been hindered by the size of the datasets involved and the pain involved in isolating indicative cases. • Significant time has been invested in exploring the problem but we are by no means experts - we need feedback - does this methodology seem appropriate - are our core assumptions logical? • Plans are to explore the effects of applying different threshold values - using known cases of reuse to inform selection of threshold value. • Pick a threshold value - determine the effects of applying this to the dataset as a whole in terms of i.e. number of merges that this yields taking samples to determine the validity of results - are instances inappropriately merged.
  • 27. Phase III – QA - Verification
  • 28. Phase III – QA - Verification • Demonstrate application of these rules by going back to the Spurious cases we looked at earlier. •Case A - using our temporal rule of 3 years - these 6 could be compressed to 3 instances. Using our spatial rule (assuming that our upper spatial threshold exceeds 100m) these could be compressed to a single instance.
  • 29. Phase III – QA - Verification •Case B - the inconsistent instance must either be terminated or merged with another instance. Applying the temporal rule it could be merged with the following instance. However its location is quite different and so we might decide that this falls outside our threshold and so instead we might terminate it with the start date of the following instance.
  • 30. Phase IV – Create QA Instance DB At some point in order to move forward we are going to have to proceed, implement the rules from phase 3 and carry out the updates to the instance database. • In doing this we run the risk of going in one of two directions - we can be either be too inclusive leading to too many instances being merged together or we cannot be inclusive enough with not enough instances merged together. • We intend to be pragmatic about this - we simply cannot have so many possibly false instances associated with each postcode. Unlikely that we are going to be able to resolve all cases. • Once the rules are in place, implementation of them should be fairly straight forward.
  • 31. Creation of Historic Snapshots • With Quality Controlled Instance database in place, yearly historic version of the postcode directories can then be derived by pulling out all instances that exist within a particular time slice.
  • 32. Outstanding Issues • Reconciling the spurious instances still an ongoing task. • We would welcome comments/feedback about the assumptions/methodologies we have chosen to adapt both from ONS and from other expert users of the AFPD. • Is there any documentation which might shed light on procedures used to update the datasets in the past & might explain some of the systematic inconsistencies we have discovered?
  • 33. Conclusions • 1. Historical & Contemporary postcode directory datasets are being accessed by academic users through UKBORDERS. • 2. QA process data has been received and loaded - raw instance database has been created. • 3. Quality Assurance Audit has been carried out - quality of dataset has been assessed. • 4. Significant Progress has been made in reconciling inconsistencies, but work remains before derived data can be created and exposed to user community. • 5. Feedback on work to date and input from others users is requested in order to bring work to a close.
  • 34. Contact Details • http://edina.ac.uk/ • james.crone@ed.ac.uk • Questions?