3. PostgreSQL
PostgreSQL is a powerful, open source object-relational
database system.
PostgreSQL evolved from the Ingres project. The project was lead
by Michael Stonebaker in1986.
In 1995, two Ph.D. students from Stonebraker's lab, Andrew Yu and
Jolly Chen, replaced Postgres' POSTQUEL query language with an
extended subset of SQL. They renamed the system to Postgres95.
In 1996, Postgres95 departed from academia and started a new life
in the open source world, when the database system took its current
name: PostgreSQL. "Postgres" is still used as an easy-to-pronounce
nick-name.
www.postgresql.org
4. PostGIS is a spatial database extender for PostgreSQL.
It adds support for geographic objects allowing
location queries to be run in SQL.
PostGIS first release was in 2001.
http://postgis.net/
5. OS MasterMap Topography Layer
OS MasterMap Topography Layer is the most detailed and accurate view
of Great Britain's landscape – from roads to fields, to buildings and trees,
fences, paths and more.
There are approx 460 million features in OS MasterMap Topography.
Change Only Update (CoU)
Due to the large number of
features in OS MasterMap.
Updates are available as CoU.
Typically a CoU for national
supply is < 6 million features.
https://www.ordnancesurvey.co.uk/busin
ess-and-government/products/topography
-layer.html
6. Loader
A powerful GML & KML loader (and translator) written in Python that makes
use of OGR 1.9.
Source data can be in GML (including .gz) or KML format and can be output
to any of the formats supported by OGR.
The source data can be prepared and enhanced during loading to
● make it suitable for loading with OGR (useful with complex feature types)
● to add value by deriving attributes
Fairly fast (national cover OS MasterMap in 2 days)
● Run 6 instances in parallel
● Use OGR PGDump driver to output the SQL and use COPY utility to load
data.
http://github.com/AstunTechnology/Loader
7. Loading CoU
Loading CoU data is basically the same as loading standard
MasterMap.
Two differences
1. Extra feature type
Departed Features.
2. Need to apply the changes
after you have loaded the data.
It is easier if you load the CoU data
into a separate schema e.g.
●
osmm_topo
●
osmm_topo_cou
8. Loading CoU cont... Applying the Changes
Remove all the departed features from the main holding.
Then, for all the changed records do an UPSERT.
For speed we do a delete and insert.
9. We do a little bit more….
BUT what about keeping the history? For that we use AUDIT :)
Identify changed areas
● Add the geometry to the departed feature table
● Create a view of changed features
● Create a table of 500m grid squares which have changed
● Use this table to update the tile caches where the data has
changed
Two phased validation
● Compare number of features loaded in CoU tables for each file
with report generated by a python scripts which parses the .gz files
● Load FVDs and compare TOIDs, version number & version data
with updated data.
10. PostgreSQL HSTORE
A PostgreSQL extension which implements the hstore data type for
storing sets of key/value pairs within a single PostgreSQL value.
This can be useful in various scenarios, such as rows with many
attributes that are rarely examined, or semi-structured data.
Keys and values are simply text strings.
Key function/ operators are:
http://www.postgresql.org/docs/current/static/hstore.html
hstore(record) construct an hstore from a record or row
populate_record(record,
hstore)
replace fields in record with matching values from hstore
hstore – hstore delete matching pairs from left operand (so can store changes)
11. PostgreSQL Audit trigger 91plus
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
● Generic trigger function used for recording changes to tables
into an audit log table.
● Row values are recorded as HSTORE fields rather than as flat text.
● Auditing can be done coarsely at a statement level or finely at a
row level.
● Control is per-audited-table.
Trigger does not track:
● SELECT
● DDL like ALTER TABLE
● Changes to system catalogs
● Trigger does record that a truncate has happened
but not the values of of the rows affected by the truncate
12. What's great about
PostgreSQL Audit trigger 91plus?
Obviously the Audit triggers need to be applied before
changes are made to the data.
Let's look at some audit data....
● Very simple to turn audit on
SELECT audit.audit_table('<schema name>.<table name');
● You can audit any table in the database.
● No extra columns required on the tables being audited.
● All changes is held in the table audit.logged_actions.
● Changes are only visible to roles which have the appropriate
privileges
13. How to Create a “Point in Time” Snapshot
1st
check the table has not be truncated after the ‘snapshot date’!
Using HSTORE function populate_record create view ‘changes_after'
●
the 1st
change per primary key from the audit table after the ‘date’
●
an extra column indicating the change i.e. D, U, I
14. Changes after view...
CREATE TEMPORARY VIEW changes_after as
SELECT DISTINCT ON (fid) *
FROM (SELECT action, (populate_record(null::osmm_topo.topographicarea,row_data)).*
FROM audit.logged_actions
WHERE logged_actions.schema_name = 'osmm_topo'
AND logged_actions.table_name = 'topographicarea'
AND logged_actions.action_tstamp_tx > '2016-06-01 20:00:00'::timestamp
ORDER BY fid, event_id
) foo;
15. Create Snapshot
Create a view/table which includes
●
All the records in the current table whose PK is not in
changes_after view
plus
●
All the records in changes_after view where change is D or U
Let's look at some example data....
CREATE TABLE snapshot AS
SELECT *, null as action FROM osmm_topo.topographicarea
WHERE fid NOT IN ( SELECT fid FROM changes_after )
UNION
SELECT * FROM changes_after WHERE action in ('D','U');