Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Geolocation Databases
in Ruby on Rails
Ireneusz Skrobiś
Lead Developer @ Selleo
Challenge description - what we have?
Challenge description - what we want?
Challenge description - why we want to do that?
Research
Research
GeoLite
ip2notion
ip2location
GeoNames
Research
GeoLite
ip2notion
ip2location
GeoNames
and the winner is:
GeoNames
Problem
countryInfo.txt
allCountries.txt
(locations, states, cities)
Problem
countryInfo.txt
252 entries
allCountries.txt
(locations, states, cities)
Problem
countryInfo.txt
252 entries
allCountries.txt
(locations, states, cities)
11,157,064 entries
Problem
countryInfo.txt
252 entries
allCountries.txt
(locations, states, cities)
11,157,064 entries
1,7GB (!)
Initial implementation
create table geo_names (
geonameid int,
name varchar(200),
fclass char(1),
fcode varchar(10),
popul...
Initial implementation
create table countries (
iso_alpha2 char(2),
name varchar(200),
geonameId int,
iso_alpha3 char(3),
...
Initial implementation
COPY countries
(iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areainsqkm,
population,con...
Initial implementation
COPY geo_names
(geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,
fcode,country,cc...
Initial implementation
# get all administrative regions for country code
where(country: code, fcode: 'ADM1').order(:name)
...
Findings
we don’t need all fields
we don’t need all entries
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
geo_names:
admin2, admin3, admin...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Database adjustment
START
COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
REMOVE COLUMNS
COUNT: 11,157,064 DB: 409,655,81...
Final implementation
psql project_dev_db
COPY countries TO
'/Users/irek/rails_workspace/battleriff/db/files/countries.csv'...
Final implementation
class AddGeoNamesTables <
ActiveRecord::Migration
def up
execute <<-SQL
create table geo_names (
geon...
Final implementation
namespace :geo_names do
desc "Setup ALL data needed for countries/states/cities selection"
task setup...
Thank you!
Live long and prosper :)
Ireneusz Skrobiś
Lead Developer @ Selleo
Upcoming SlideShare
Loading in …5
×

Geolocation Databases in Ruby on Rails

639 views

Published on

GeoNames (Geolocation Database) in Ruby on Rails - how to make it more Github friendly.

Published in: Engineering

Geolocation Databases in Ruby on Rails

  1. 1. Geolocation Databases in Ruby on Rails Ireneusz Skrobiś Lead Developer @ Selleo
  2. 2. Challenge description - what we have?
  3. 3. Challenge description - what we want?
  4. 4. Challenge description - why we want to do that?
  5. 5. Research
  6. 6. Research GeoLite ip2notion ip2location GeoNames
  7. 7. Research GeoLite ip2notion ip2location GeoNames and the winner is: GeoNames
  8. 8. Problem countryInfo.txt allCountries.txt (locations, states, cities)
  9. 9. Problem countryInfo.txt 252 entries allCountries.txt (locations, states, cities)
  10. 10. Problem countryInfo.txt 252 entries allCountries.txt (locations, states, cities) 11,157,064 entries
  11. 11. Problem countryInfo.txt 252 entries allCountries.txt (locations, states, cities) 11,157,064 entries 1,7GB (!)
  12. 12. Initial implementation create table geo_names ( geonameid int, name varchar(200), fclass char(1), fcode varchar(10), population bigint, country varchar(2), admin1 varchar(20), admin2 varchar(80), admin3 varchar(20), admin4 varchar(20), asciiname varchar(200), alternatenames text, latitude float, longitude float, cc2 varchar(100), elevation int, gtopo30 int, timezone varchar(40), moddate date );
  13. 13. Initial implementation create table countries ( iso_alpha2 char(2), name varchar(200), geonameId int, iso_alpha3 char(3), iso_numeric integer, fips_code varchar(3), capital varchar(200), areainsqkm double precision, population integer, continent varchar(2), tld varchar(10), currencycode varchar(3), currencyname varchar(20), phone varchar(20), postalcode varchar(100), postalcoderegex varchar(200), languages varchar(200), neighbors varchar(50), equivfipscode varchar(3) );
  14. 14. Initial implementation COPY countries (iso_alpha2,iso_alpha3,iso_numeric,fips_code,name,capital,areainsqkm, population,continent,tld,currencycode,currencyname,phone,postalcode, postalcoderegex,languages,geonameid,neighbors,equivfipscode) FROM '#{Rails.root.join('db', 'files', 'countryInfo.txt').to_s}' null as '' CSV DELIMITER 't' HEADER;
  15. 15. Initial implementation COPY geo_names (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass, fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation, gtopo30,timezone,moddate) FROM '#{Rails.root.join('db', 'files', 'allCountries.txt').to_s}' null as '' CSV DELIMITER 't' HEADER;
  16. 16. Initial implementation # get all administrative regions for country code where(country: code, fcode: 'ADM1').order(:name) # get all cities/villages country code and administrative region where(country: code, admin1: adm.admin1, fclass: 'P') .where.not(population: 0) .order(:name)
  17. 17. Findings we don’t need all fields we don’t need all entries
  18. 18. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms
  19. 19. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS
  20. 20. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS geo_names: admin2, admin3, admin4, asciiname, alternatenames, latitude, longitude, cc2, elevation, gtopo30, timezone, moddate countries: iso_alpha3, iso_numeric, fips_code, capital, areainsqkm, population, continent, tld, currencycode, currencyname, phone, postalcode, postalcoderegex, languages, neighbors, equivfipscode
  21. 21. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms
  22. 22. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all
  23. 23. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all COUNT: 4,729,998 DB: 160,005,106 TIME: 1310ms
  24. 24. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all COUNT: 4,729,998 DB: 160,005,106 TIME: 1310ms GeoName.where(fclass: 'P', population: 0).delete_all
  25. 25. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all COUNT: 4,729,998 DB: 160,005,106 TIME: 1310ms GeoName.where(fclass: 'P', population: 0).delete_all COUNT: 723,681 DB: 27,268,183 TIME: 854ms
  26. 26. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all COUNT: 4,729,998 DB: 160,005,106 TIME: 1310ms GeoName.where(fclass: 'P', population: 0).delete_all COUNT: 723,681 DB: 27,268,183 TIME: 854ms GeoName.where(fclass: 'A').where.not(fcode: 'ADM1').delete_all
  27. 27. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all COUNT: 4,729,998 DB: 160,005,106 TIME: 1310ms GeoName.where(fclass: 'P', population: 0).delete_all COUNT: 723,681 DB: 27,268,183 TIME: 854ms GeoName.where(fclass: 'A').where.not(fcode: 'ADM1').delete_all COUNT: 367,782 DB: 13,644,454 TIME: 770ms
  28. 28. Database adjustment START COUNT: 11,157,064 DB: 1,748,198,652 TIME: 2206ms REMOVE COLUMNS COUNT: 11,157,064 DB: 409,655,812 TIME: 2197ms GeoName.where.not(fclass: %w(A P)).delete_all COUNT: 4,729,998 DB: 160,005,106 TIME: 1310ms GeoName.where(fclass: 'P', population: 0).delete_all COUNT: 723,681 DB: 27,268,183 TIME: 854ms GeoName.where(fclass: 'A').where.not(fcode: 'ADM1').delete_all COUNT: 367,782 DB: 13,644,454 TIME: 61 ms (after restart)
  29. 29. Final implementation psql project_dev_db COPY countries TO '/Users/irek/rails_workspace/battleriff/db/files/countries.csv' DELIMITER E't' CSV HEADER; COPY geo_names TO '/Users/irek/rails_workspace/battleriff/db/files/geo_names.csv' DELIMITER E't' CSV HEADER;
  30. 30. Final implementation class AddGeoNamesTables < ActiveRecord::Migration def up execute <<-SQL create table geo_names ( geonameid int, name varchar(200), fclass char(1), fcode varchar(10), population bigint, country varchar(2), admin1 varchar(20) ); create table countries ( iso_alpha2 char(2), name varchar(200), geonameid int ); SQL end def down drop_table :countries drop_table :geo_names end end
  31. 31. Final implementation namespace :geo_names do desc "Setup ALL data needed for countries/states/cities selection" task setup_all: :environment do ActiveRecord::Base.connection.execute <<-SQL copy countries (iso_alpha2,name,geonameid) from '#{Rails.root.join('db', 'files', 'countries.csv').to_s}' null as '' CSV DELIMITER 't' HEADER; copy geo_names (geonameid,name,fclass,fcode,population,country, admin1) from '#{Rails.root.join('db', 'files', 'geo_names.csv').to_s}' null as '' CSV DELIMITER 't' HEADER; SQL end end
  32. 32. Thank you! Live long and prosper :) Ireneusz Skrobiś Lead Developer @ Selleo

×