*** Presented by Roland Olbricht at State of the Map 2013
*** For the video of this presentation please see http://lanyrd.com/2013/sotm/scpkhk/
*** Full schedule available at http://wiki.openstreetmap.org/wiki/State_Of_The_Map_2013
Overpass API has become the most used database to extract OSM data over the web. Yet it remained highly available. This high availability of Overpass API is ensured by its load management. For the first part of the talk we will discuss statistics for the historic and current usage of Overpass API. In the second part, we will present the mechanisms used for this load management and explain how to assess the footprint of a query. Finally we will advise how to get response to your queries as fast as possible.
6. Share resources
across 10^7 !
=> [timeout:...]: Server keeps track of „free time units“
Server accepts a client request
if it is below half of free server time units
Client requests
Server state
240000 free time units
With timeout 180 ?
239820 free time units
With timeout 86400 ?
153420 free time units
With timeout 86400 ?
because 153420/2 < 86400.
153420 free time units
With timeout 180 ?
153240 free time units
7. Share resources
across 10^7 !
Short allowed runtime
High Priority
Long allowed runtime
Low Priority
Since June 2012
all requests with [timeout:...] < 180 accepted
requests with longer timeout occasionally rejected
10. „out“ vs „out skel“
vs „out meta“
Request
node
[name=„Aston Business School“];
out;
Disk time
Memory
(node 1473072867,
lat = 52.4867839,
lon = -1.8884618)
-1.8884618,
amenity=bicycle_parking
bcc_ref=433
bicycle_parking=stands
capacity=10
covered=yes
name=Aston Business School)
11. „out“ vs „out skel“
vs „out meta“
Request
node
[name=„Aston Business School“];
out skel;
Disk time
Memory
(node 1473072867,
lat = 52.4867839,
lon = -1.8884618)
12. „out“ vs „out skel“
vs „out meta“
Request
node
[name=„Aston Business School“];
out meta;
Disk time
Memory
(node 1473072867,
version = 2, timestamp = ...,
lat = 52.4867839,
…,
lon = -1.8884618)
lat = 52.4867839,
lon = -1.8884618,
amenity=bicycle_parking
bcc_ref=433
bicycle_parking=stands
capacity=10
covered=yes
name=Aston Business School)
13. „out“ vs „out skel“
vs „out meta“
Every statement
Request
takes disk time
node
Internally, we only
[name=„Aston Business School“];
store skeletons.
out meta;
Disk time
Memory
(node 1473072867,
lat = 52.4867839,
lon = -1.8884618)
15. The query statement
is a pipeline
Planning decisions
Ids
Collect ids of potential results
Copy from memory if possible
derive geo index from query
raw data
lookup geo index by ids
fetch all skeletons
cheap filtering
filtering
filter by key conditionals
expensive filtering
more conditions better than fewer
16. The query statement pipeline:
node[name=„Aston Business School“];
Planning decisions
Collect ids of potential results
Copy from memory if possible
derive geo index from query
lookup geo index by ids
fetch all skeletons
cheap filtering
filter by key conditionals
expensive filtering
Disk time
(node 1473072867)
(Idx 0x42f00f00)
(node 1473072867,
lat=52.487, lon=-1.889)
17. The query statement pipeline:
node[amenity=bicycle_parking];
Planning decisions
Collect ids of potential results
Copy from memory if possible
derive geo index from query
lookup geo index by ids
fetch all skeletons
cheap filtering
filter by key conditionals
expensive filtering
Disk time
(node 1000, …,
node …, node …,
node 1473072867,
node …, node …) [~ 80'000 objects]
(Idx 0x1, 0x2, 0x3, …,
...)
((node 1, lat=..., lon=...,
…,
(node 1473072867, lat=52.487, lon=-1.889),
...)
…
~80'000 disc seeks
…
~30'000 disc seeks
18. The query statement pipeline:
node[amenity=bicycle_parking]
(52.48, -1.89, 52.49, -1.88);
Planning decisions
Collect ids of potential results
Copy from memory if possible
derive geo index from query
lookup geo index by ids
fetch all skeletons
cheap filtering
filter by key conditionals
expensive filtering
Disk time
(node 1000, …,
node …, node …,
node 1473072867,
node …, node …) [~ 80'000 objects]
(Idx 0x42f00f00)
(node 1473072867,
lat=52.487, lon=-1.889)
19. The query statement pipeline:
node[name=„Aston Business School“]
(51.0, -3.0, 60.0, 3.0);
Planning decisions
Collect ids of potential results
Copy from memory if possible
derive geo index from query
lookup geo index by ids
fetch all skeletons
(node 1473072867)
(Idx 0x42000000, …,
Idx 0x42ffffff)
(node 1473072867,
lat=52.487, lon=-1.889)
cheap filtering
filter by key conditionals
expensive filtering
Disk time
…
~3'000 disc seeks
20. Resumee
Be bold, the server cares for large queries
Select right „out“ mode for performance
and for quick testing
Use all available information,
in particular small bounding boxes
and specific search conditionals