Inside Overpass API - State of the Map 2013

Inside

Roland Olbricht
at SOTM 2013 in Birmingham

Overview
1. The server as a whole
2. Processing of requests
3. The query statement pipeline

4000 to 6000
Unique IPs
per day

150'000 to
250'000
requests per day

10 GB to 30 GB
result data
per day

Statistics of 2013-08-30
Download
size per IP

# Unique IPs

Download size
per request

> 1 GB

4

884'037

100 MB – 1GB

20

8'513'484

10 MB – 100MB

150

61'999

1 MB – 10 MB

384

20'245

100 KB – 1 MB

595

4'740

10 KB – 100 KB

1104

1'696

1 KB – 10 KB

1189

921

< 1 KB

764

339

Share resources
across 10^7 !
=> [timeout:...]: Server keeps track of „free time units“
Server accepts a client request
if it is below half of free server time units
Client requests

Server state
240000 free time units

With timeout 180 ?


With timeout 86400 ?


With timeout 86400 ?

because 153420/2 < 86400.

With timeout 180 ?


Share resources
across 10^7 !

Short allowed runtime

High Priority

Long allowed runtime

Low Priority

Since June 2012
all requests with [timeout:...] < 180 accepted
requests with longer timeout occasionally rejected

The bottleneck ...

almost completely idle

… is disk I/O.

peaks often
near 100%

„out“ vs „out skel“
vs „out meta“

Request
node
[name=„Aston Business School“];
out;

Disk time

Memory
(node 1473072867,
lat = 52.4867839,
lon = -1.8884618)
-1.8884618,

amenity=bicycle_parking
bcc_ref=433
bicycle_parking=stands
capacity=10
covered=yes
name=Aston Business School)

vs „out meta“

Request
node
out skel;

Disk time

Memory
(node 1473072867,
lat = 52.4867839,
lon = -1.8884618)

vs „out meta“

Request
node
out meta;

Disk time

Memory
(node 1473072867,
version = 2, timestamp = ...,
lat = 52.4867839,
…,
lon = -1.8884618)
lat = 52.4867839,

lon = -1.8884618,
amenity=bicycle_parking
bcc_ref=433
bicycle_parking=stands
capacity=10
covered=yes
name=Aston Business School)

vs „out meta“
Every statement
Request
takes disk time
node

Internally, we only
store skeletons.

out meta;

Disk time

Memory
(node 1473072867,
lat = 52.4867839,
lon = -1.8884618)

3. The query statement pipeline

The query statement
is a pipeline
Planning decisions

Ids

Collect ids of potential results
Copy from memory if possible
derive geo index from query

raw data

lookup geo index by ids
fetch all skeletons
cheap filtering

filtering

filter by key conditionals
expensive filtering

more conditions better than fewer

The query statement pipeline:
node[name=„Aston Business School“];
Planning decisions
fetch all skeletons
cheap filtering
expensive filtering

Disk time

(node 1473072867)

(Idx 0x42f00f00)
(node 1473072867,
lat=52.487, lon=-1.889)

node[amenity=bicycle_parking];
Planning decisions
fetch all skeletons
cheap filtering
expensive filtering

Disk time

(node 1000, …,
node …, node …,
node 1473072867,
node …, node …) [~ 80'000 objects]
(Idx 0x1, 0x2, 0x3, …,
...)
((node 1, lat=..., lon=...,
…,
(node 1473072867, lat=52.487, lon=-1.889),
...)

…
~80'000 disc seeks

…
~30'000 disc seeks

node[amenity=bicycle_parking]
(52.48, -1.89, 52.49, -1.88);
Planning decisions
fetch all skeletons
cheap filtering
expensive filtering

Disk time

(node 1000, …,
node …, node …,
node 1473072867,
node …, node …) [~ 80'000 objects]

(Idx 0x42f00f00)
(node 1473072867,
lat=52.487, lon=-1.889)

node[name=„Aston Business School“]
(51.0, -3.0, 60.0, 3.0);
Planning decisions
fetch all skeletons

(node 1473072867)
(Idx 0x42000000, …,
Idx 0x42ffffff)

(node 1473072867,
lat=52.487, lon=-1.889)

cheap filtering
expensive filtering

Disk time

…
~3'000 disc seeks

Resumee

Be bold, the server cares for large queries
Select right „out“ mode for performance
and for quick testing
Use all available information,
in particular small bounding boxes
and specific search conditionals

Inside Overpass API - State of the Map 2013

Recommended

Recommended

More Related Content

More from OSMFstateofthemap

More from OSMFstateofthemap (6)

Recently uploaded

Recently uploaded (20)

Inside Overpass API - State of the Map 2013