In 2001 Ian Painter led the team responsible for OS MasterMap. This ground-breaking project took Ordnance Survey’s Land-Line product and created the world’s first database of ‘real world objects’. Now some 10 years on, this talk will look back at the original premise of its creation and how its been used over the years.
First and foremost OSMM was designed as seamless data and a big hope for the product was that unlike LandLine it would no longer be used as a backdrop map. Many organisations use OSMM as backdrop map but by doing so they’re missing a huge amount of its value. It’s now time to put the map aside and use the data. Welcome to Big Data.
Focusing on Big Data concepts for analysis and query, the second half of this talk will introduce Big Data concepts and how Big Data platform scales on commodity hardware, makes extensive use of parallel processing and works with GI data in a manner totally different from anything we’ve seen before. Not just that, but how Big Data offers all this at a fraction of the cost of traditional GIS and relational databases. Big Data will finally realise the original vision of OSMM – and when that happens OS will need to change the name!
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
OS MasterMap it's not a map - but data
1. OS MasterMap it’s not a map – but data
Ian Painter
Snowflake Software
2. About me
• I don’t represent Ordnance Survey,
• I worked there for 10 years
• My opinions are my own
3. First Some History
• County Series was Ordnance
Survey’s large scale paper product
• Landline was Ordnance Survey’s
first digital product
• Built to print paper maps quicker
• Blind digitised … so plenty of
6. Fastrack
• Take Landline as input
• Clean it up … like really clean it up
• Stitch it together (edgematching)
• Polygonise it
• Restructure the road network
• Beef up the attribution
• Multi-level structures
• Classify all the polygons
• Associate all the cartographic text
• Only 1% manual editing
• Complete in a year
7. Geospatial Object Server
All very well creating all this stuff but we need somewhere
to put it:
•Store it all seamlessly
•Maintain the topology
•Store change
•Seamless ordering
•Huge data volumes
•From 100 features to 450 million
•Built on an object oriented database called ObjectStore
8. Fastrack + GOS =
• Unique identifiers
• Real world
• Seamless Product
• Change-only-update
• Delivered as GML
9. Impacts on the Industry
• Data management
– From files to databases
• Large data volumes
• Complex data models
– From simple features to table joins, multiple geometries
• GML
– XML rather than proprietary
• Change only update
– Individual feature update rather than file replacements
10. Key Market Selling Points
• A data product
– Clean
– Structured
– Rich attribution
– Unique identifiers
• Seamless - designed for query and analysis
• Change Intelligence
– Change triggers
– Historical archives
11. How did we do 10 years on
• Map vs data
– Coloured backdrop map
– Cloud web mapping is step backwards
• Change Only
– Slow start but most now applying COU
– Little use of COU for change intelligence queries
• Identifiers
– Core referencing hasn’t really worked
• Seamless
– Very little spatial analysis
12. But Why?
• Data model capabilities of GIS are very limiting
• Too much focus on web mapping
– Even more so with online mapping portals
• Proprietary nature of GIS
– The limitation of it’s formats
– Preference for file based data management
– Lack of integration with mainstream IT
• Functionality is focused on the map, not the data
• Huge hardware requirements for spatial analytics
13. So what’s going to change all this?
When are we going to drop the map?
Isn’t it just data? Big Data?
14. Well it’s not Big Data … but I needed a link!
• Big Data is a buzz word!
• An technology paradigm to massage your ego
• Everybody likes to have something … BIG
15. Seriously … a Big Data 101
• Big Data looks at data in two ways
1. Structured Data – think schema, data models
2. Unstructured Data – free text, insurance claim, transaction log
• Structured Data tends to be stored in database
– But not any old database … a NOSQL database
• Unstructured Data tends stored on a file system
– But not any old file system … a distributed filesystem
– And then processed through a paradigm called MapReduce
16. So what’s all the tech
•NOSQL Databases:
– Columnar, Document, Key-Value and Graph
– Netezza, Vertica, Terradata, MongoDB,
•Unstructured data:
– Hadoop: hdfs, MapReduce, Amazon Elastic MapReduce
•Can also be hardware
– Lot’s of hardware
17. Who uses this stuff
• Yahoo created Hadoop, it runs:
– Facebook, Twitter and eBay
– Heavy use in Telco, Finance and retail
• Facebook, runs the largest Hadoop cluster in existence
– 21PB of storage,
– 2000 (8 core) machine,
– 12TB per machine,
– 32Gb of RAM
18. But what about Geo support?
• IBM Netezza has native spatial
• MongoDB
• ESRI ArcGIS 10.1 has native Netezza support
• But that’s only required if we think of geo as map
• Geo won’t be just maps for much longer …
• Geo is just collection of relationships
– This is next to this, this is inside that
• NOSQL database are far better suited to graphs
19. A Big Data problem …
• NFC – Near Field Communication will put
a spatial component of every cash
transaction
• Think of running a spatial query against
national OSMM to check if the NFC
transaction happened in the correct
location
– So that’s 1 x,y against 440 millions features
• NFC has the potential to overtake cash in
less than 10 years
– We’re talking 100’s millions vs 100’s of millions
a minute … now that’s a Big Data problem
20. GIS ain’t gonna cut it
• Step up
– Big Data
– Geo as graphs and relationships
• Take a seat
– GIS and rest your tired data model capabilities
– Maps and cartography
– Relational databases
• Use Geo to give me an answer but don’t show me a map
21. In summary
• Despite OS MasterMap being over 10 years old
– It’s still ahead of GIS capabilities
– A lot of inherent value still isn’t used
• Spatial is not special it’s just another data type
• Too much focus on a map to solve all
• Big Data is coming, Geo will be very different
• Some sectors will skip GIS altogether
• Think data, not maps
• Graphs, not geometry