A FAST OFFLINE REVERSE
GEOCODER IN PYTHON
Ajay Thampi
Data Scientist, OpenSignal
OUTLINE
OpenSignal
Motivation
The Library
Demo
Performance Results
Applications
Contributions from the Community
OPENSIGNAL
http://opensignal.com | http://wifimapper.com
Cellular data points: 41 billion
WiFi data points: 50 billion
Speed tests: 93 million
MOTIVATION
Reverse geocode terabytes of data (~50M coordinates / day)
Options:
Online web services (Google Maps, OpenStreetMap)
Restrictive
Slow
Offline (PostGIS, Python libraries)
Complex
Slow
THE LIBRARY
Improves on an existing library by Richard Penman
Supports Python 2 and 3
Geocodes a lot more information
High Performance
Open Source (LGPL license)
Statistics: (since 27/03/2015)
Downloads: 2,649
Commits: 41
Committers: 5
Stars: 1,089
Forks: 40
#notsohumblebrag
• Place name
• Country Code (ISO-3166)
• Admin region 1
• Admin region 2
• Coordinates
IMPLEMENTATION
Two modes:
Mode 1: Single-process
Mode 2 (Default): Multi-process
Source of data: GeoNames
Places with a population > 1000 (Total = 144,859)
GPS coordinates of cities loaded into a K-D Tree
Nearest neighbour (NN) algorithm
Mode 1: cKDTree class in scipy
Mode 2: Parallelised cKDTree
Dependencies:
numpy
scipy
PARALLELISED K-DTREE
Uses the multiprocessing module
Pros over threading:
Exploits multiple CPUs and cores
No GIL limitation
Cons over threading:
Separate memory space => IPC or Shared Memory
Static Scheduling
K-D Tree Settings:
Euclidean distance (Minkowski p-norm where p = 2)
Distance upper bound: Inf
Refer multiprocessing tutorial by Sturla Molden, University of Oslo
DEMO
PERFORMANCE RESULTS
APPLICATIONS (1/2)
• Top 20 regions in the UK where OpenSignal users
run speed tests
Data from Sep-Dec 2014
APPLICATIONS (2/2)
• Speed test data points from the Greater London region
Data from Sep-Dec 2014Visualisation using Google FusionTables
HATTIP
Python 3 Support (Brandon Liu and David J. Felix)
C++ Wrapper (Mehdi Lauters)
@thampiman
ThankYou
Q & A

A Fast, Offline Reverse Geocoder in Python