Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A FAST OFFLINE REVERSE
GEOCODER IN PYTHON
Ajay Thampi
Data Scientist, OpenSignal
OUTLINE
OpenSignal
Motivation
The Library
Demo
Performance Results
Applications
Contributions from the Community
OPENSIGNAL
http://opensignal.com | http://wifimapper.com
Cellular data points: 41 billion
WiFi data points: 50 billion
Spee...
MOTIVATION
Reverse geocode terabytes of data (~50M coordinates / day)
Options:
Online web services (Google Maps, OpenStree...
THE LIBRARY
Improves on an existing library by Richard Penman
Supports Python 2 and 3
Geocodes a lot more information
High...
IMPLEMENTATION
Two modes:
Mode 1: Single-process
Mode 2 (Default): Multi-process
Source of data: GeoNames
Places with a po...
PARALLELISED K-DTREE
Uses the multiprocessing module
Pros over threading:
Exploits multiple CPUs and cores
No GIL limitati...
DEMO
PERFORMANCE RESULTS
APPLICATIONS (1/2)
• Top 20 regions in the UK where OpenSignal users
run speed tests
Data from Sep-Dec 2014
APPLICATIONS (2/2)
• Speed test data points from the Greater London region
Data from Sep-Dec 2014Visualisation using Googl...
HATTIP
Python 3 Support (Brandon Liu and David J. Felix)
C++ Wrapper (Mehdi Lauters)
@thampiman
ThankYou
Q & A
Upcoming SlideShare
Loading in …5
×

A Fast, Offline Reverse Geocoder in Python

1,192 views

Published on

My presentation at the PyData conference in London on 21-June-2015.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

A Fast, Offline Reverse Geocoder in Python

  1. 1. A FAST OFFLINE REVERSE GEOCODER IN PYTHON Ajay Thampi Data Scientist, OpenSignal
  2. 2. OUTLINE OpenSignal Motivation The Library Demo Performance Results Applications Contributions from the Community
  3. 3. OPENSIGNAL http://opensignal.com | http://wifimapper.com Cellular data points: 41 billion WiFi data points: 50 billion Speed tests: 93 million
  4. 4. MOTIVATION Reverse geocode terabytes of data (~50M coordinates / day) Options: Online web services (Google Maps, OpenStreetMap) Restrictive Slow Offline (PostGIS, Python libraries) Complex Slow
  5. 5. THE LIBRARY Improves on an existing library by Richard Penman Supports Python 2 and 3 Geocodes a lot more information High Performance Open Source (LGPL license) Statistics: (since 27/03/2015) Downloads: 2,649 Commits: 41 Committers: 5 Stars: 1,089 Forks: 40 #notsohumblebrag • Place name • Country Code (ISO-3166) • Admin region 1 • Admin region 2 • Coordinates
  6. 6. IMPLEMENTATION Two modes: Mode 1: Single-process Mode 2 (Default): Multi-process Source of data: GeoNames Places with a population > 1000 (Total = 144,859) GPS coordinates of cities loaded into a K-D Tree Nearest neighbour (NN) algorithm Mode 1: cKDTree class in scipy Mode 2: Parallelised cKDTree Dependencies: numpy scipy
  7. 7. PARALLELISED K-DTREE Uses the multiprocessing module Pros over threading: Exploits multiple CPUs and cores No GIL limitation Cons over threading: Separate memory space => IPC or Shared Memory Static Scheduling K-D Tree Settings: Euclidean distance (Minkowski p-norm where p = 2) Distance upper bound: Inf Refer multiprocessing tutorial by Sturla Molden, University of Oslo
  8. 8. DEMO
  9. 9. PERFORMANCE RESULTS
  10. 10. APPLICATIONS (1/2) • Top 20 regions in the UK where OpenSignal users run speed tests Data from Sep-Dec 2014
  11. 11. APPLICATIONS (2/2) • Speed test data points from the Greater London region Data from Sep-Dec 2014Visualisation using Google FusionTables
  12. 12. HATTIP Python 3 Support (Brandon Liu and David J. Felix) C++ Wrapper (Mehdi Lauters)
  13. 13. @thampiman ThankYou Q & A

×