Your SlideShare is downloading. ×
Inside Overpass API - State of the Map 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Inside Overpass API - State of the Map 2013

466
views

Published on

*** Presented by Roland Olbricht at State of the Map 2013 …

*** Presented by Roland Olbricht at State of the Map 2013
*** For the video of this presentation please see http://lanyrd.com/2013/sotm/scpkhk/
*** Full schedule available at http://wiki.openstreetmap.org/wiki/State_Of_The_Map_2013

Overpass API has become the most used database to extract OSM data over the web. Yet it remained highly available. This high availability of Overpass API is ensured by its load management. For the first part of the talk we will discuss statistics for the historic and current usage of Overpass API. In the second part, we will present the mechanisms used for this load management and explain how to assess the footprint of a query. Finally we will advise how to get response to your queries as fast as possible.

Published in: Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
466
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Inside Roland Olbricht at SOTM 2013 in Birmingham
  • 2. Overview 1. The server as a whole 2. Processing of requests 3. The query statement pipeline
  • 3. 1. The server as a whole
  • 4. 4000 to 6000 Unique IPs per day 150'000 to 250'000 requests per day 10 GB to 30 GB result data per day
  • 5. Statistics of 2013-08-30 Download size per IP # Unique IPs Download size per request > 1 GB 4 884'037 100 MB – 1GB 20 8'513'484 10 MB – 100MB 150 61'999 1 MB – 10 MB 384 20'245 100 KB – 1 MB 595 4'740 10 KB – 100 KB 1104 1'696 1 KB – 10 KB 1189 921 < 1 KB 764 339
  • 6. Share resources across 10^7 ! => [timeout:...]: Server keeps track of „free time units“ Server accepts a client request if it is below half of free server time units Client requests Server state 240000 free time units With timeout 180 ? 239820 free time units With timeout 86400 ? 153420 free time units With timeout 86400 ? because 153420/2 < 86400. 153420 free time units With timeout 180 ? 153240 free time units
  • 7. Share resources across 10^7 ! Short allowed runtime High Priority Long allowed runtime Low Priority Since June 2012 all requests with [timeout:...] < 180 accepted requests with longer timeout occasionally rejected
  • 8. 2. Processing of requests
  • 9. The bottleneck ... almost completely idle … is disk I/O. peaks often near 100%
  • 10. „out“ vs „out skel“ vs „out meta“ Request node [name=„Aston Business School“]; out; Disk time Memory (node 1473072867, lat = 52.4867839, lon = -1.8884618) -1.8884618, amenity=bicycle_parking bcc_ref=433 bicycle_parking=stands capacity=10 covered=yes name=Aston Business School)
  • 11. „out“ vs „out skel“ vs „out meta“ Request node [name=„Aston Business School“]; out skel; Disk time Memory (node 1473072867, lat = 52.4867839, lon = -1.8884618)
  • 12. „out“ vs „out skel“ vs „out meta“ Request node [name=„Aston Business School“]; out meta; Disk time Memory (node 1473072867, version = 2, timestamp = ..., lat = 52.4867839, …, lon = -1.8884618) lat = 52.4867839, lon = -1.8884618, amenity=bicycle_parking bcc_ref=433 bicycle_parking=stands capacity=10 covered=yes name=Aston Business School)
  • 13. „out“ vs „out skel“ vs „out meta“ Every statement Request takes disk time node Internally, we only [name=„Aston Business School“]; store skeletons. out meta; Disk time Memory (node 1473072867, lat = 52.4867839, lon = -1.8884618)
  • 14. 3. The query statement pipeline
  • 15. The query statement is a pipeline Planning decisions Ids Collect ids of potential results Copy from memory if possible derive geo index from query raw data lookup geo index by ids fetch all skeletons cheap filtering filtering filter by key conditionals expensive filtering more conditions better than fewer
  • 16. The query statement pipeline: node[name=„Aston Business School“]; Planning decisions Collect ids of potential results Copy from memory if possible derive geo index from query lookup geo index by ids fetch all skeletons cheap filtering filter by key conditionals expensive filtering Disk time (node 1473072867) (Idx 0x42f00f00) (node 1473072867, lat=52.487, lon=-1.889)
  • 17. The query statement pipeline: node[amenity=bicycle_parking]; Planning decisions Collect ids of potential results Copy from memory if possible derive geo index from query lookup geo index by ids fetch all skeletons cheap filtering filter by key conditionals expensive filtering Disk time (node 1000, …, node …, node …, node 1473072867, node …, node …) [~ 80'000 objects] (Idx 0x1, 0x2, 0x3, …, ...) ((node 1, lat=..., lon=..., …, (node 1473072867, lat=52.487, lon=-1.889), ...) … ~80'000 disc seeks … ~30'000 disc seeks
  • 18. The query statement pipeline: node[amenity=bicycle_parking] (52.48, -1.89, 52.49, -1.88); Planning decisions Collect ids of potential results Copy from memory if possible derive geo index from query lookup geo index by ids fetch all skeletons cheap filtering filter by key conditionals expensive filtering Disk time (node 1000, …, node …, node …, node 1473072867, node …, node …) [~ 80'000 objects] (Idx 0x42f00f00) (node 1473072867, lat=52.487, lon=-1.889)
  • 19. The query statement pipeline: node[name=„Aston Business School“] (51.0, -3.0, 60.0, 3.0); Planning decisions Collect ids of potential results Copy from memory if possible derive geo index from query lookup geo index by ids fetch all skeletons (node 1473072867) (Idx 0x42000000, …, Idx 0x42ffffff) (node 1473072867, lat=52.487, lon=-1.889) cheap filtering filter by key conditionals expensive filtering Disk time … ~3'000 disc seeks
  • 20. Resumee Be bold, the server cares for large queries Select right „out“ mode for performance and for quick testing Use all available information, in particular small bounding boxes and specific search conditionals
  • 21. Thank you for your attention