SlideShare a Scribd company logo
1 of 36
Geolocation with Cassandra
Austin Cassandra Users – Jan 21, 2016
Matt Vorst
• Cassandra User
– Since 2011
• Architect / Java developer
• Corporate Life
– EntekIRD & Rockwell Automation
• Serial Entrepreneur
– EventsInCincinnati.com – Co-founder
– Dotloop, Inc. – Co-founder and CTO
– Physi, Inc. – Co-founder and C*O
Physi [fiz-ee] (noun)
1. a mobile app that pairs nearby people to play sports
2. a movement to make a smaller, happier, healthier
world through play
Why Cassandra
• Operations is Hard
– Most relational DB’s don’t scale easily or well
– Murphy’s Law always strikes at the worst time
– Recovery shouldn’t come at a high cost
• Distributed Design
– Cassandra is a distributed technology
– Applications are designed to be distributed
Necessary Location Services
• Proximity Search
– Postal code range search
– Distance between postal codes
• Location Conversion
– Postal code to latitude/longitude
– Latitude/longitude to postal code
• Search
– City name lookup
Setup
• Create the Keyspace
cqlsh> CREATE KEYSPACE physi WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE physi;
Postal Code to Latitude/Longitude
• Use Case
– Place markers on a map
• Solution
– Buy a database
– PK: Country/postal code
Postal Code to Latitude/Longitude
• Create Column Family
cqlsh>CREATE TABLE zip_code_master (
location_country text, zip_code text, location_uuid uuid,
location_type text, city text, county text, state text,
latitude_e6 bigint, longitude_e6 bigint,
PRIMARY KEY (location_country, zip_code));
Postal Code to Latitude/Longitude
• Add data
cqlsh>INSERT INTO zip_code_master
(location_country, zip_code, location_uuid, location_type,
city, county, state, latitude_e6, longitude_e6)
VALUES(‘US’,’45219’,
7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39,
’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’,
39127564,-84514489);
Postal Code to Latitude/Longitude
• Search
cqlsh>SELECT * FROM zip_code_master WHERE
location_country = 'US' AND zip_code = '45219';
location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state
------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------
US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH
• Results
Postal Code to Latitude/Longitude
• Things to Know
– Row width: ~10
– Postal codes cover different areas
– A single postal codes can span different cities,
counties, and even states
– The largest postal code covers 10,000 mi2
Latitude/Longitude to Postal Code
• Use Case
– Determine which postal code a
user is currently in server side
– Use this to return suggestions
Latitude/Longitude to Postal Code
• The Relational Way
– Draw a box, loop, and calculate
– Query:
SELECT * FROM location_table
WHERE (min lat) < latitude AND latitude < (max lat)
AND (min long) < longitude AND longitude < (max long)
Latitude/Longitude to Postal Code
• Cassandra Solution
– Prebuild a lookup table
• Slice the US up into 7mi by <=7mi squares
• ~69 miles between lines of latitude
• Longitude is not equally spaced
– PK: latE1|longE1
Latitude/Longitude to Postal Code
• Cassandra Solution (cont.)
– Build: Add bordering postal codes
– Read: Loop and calculate distance
Latitude/Longitude to Postal Code
• Create Column Family
cqlsh>CREATE TABLE latitude_longitude_zip_code
(latitude_e1 int, longitude_e1 int, location_country text,
zip_code text, location text,
PRIMARY KEY ((latitude_e1, longitude_e1),
location_country, zip_code));
Latitude/Longitude to Postal Code
• Add data
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45219','{json data}');
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45220','{json data}');
Latitude/Longitude to Postal Code
• Search
cqlsh>SELECT * FROM latitude_longitude_zip_code
WHERE latitude_e1 = 391 AND longitude_e1 = -845;
• Results
latitude_e1 | longitude_e1 | location_country | zip_code | location
-------------+--------------+------------------+----------+-------------
391 | -845 | US | 45206 | {json data}
391 | -845 | US | 45219 | {json data}
391 | -845 | US | 45220 | {json data}
Latitude/Longitude to Postal Code
• Things to Know
– Row width: 1 to ~50
– This was a short lived solution
– Primarily using client location services
– Still used as a fallback for web
– Creation of the lookup table took 3 hours on
localhost with RAID 0 SSDs
City Name Lookup
• Use Case
– Auto-complete city name
• Solution
– Create a lookup
– RK: searchTerm
– CN: (0 padded count)|country|city
City Name Lookup
• Create Column Family
cqlsh>CREATE TABLE name_search
(search_term text, occurrence_count int,
location_country text, city text, state text, location text,
PRIMARY KEY ((search_term), occurrence_count,
location_country, city, state));
City Name Lookup
• Add data
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}');
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
City Name Lookup
• Search
cqlsh>SELECT * FROM name_search
WHERE search_term = 'aus'
ORDER BY occurrence_count DESC;
• Results
search_term | occurrence_count | location_country | city | state | location
-------------+------------------+------------------+-------------+-------+-------------
aus | 31 | US | austin | TX | {json data}
aus | 10 | US | austell | GA | {json data}
aus | 10 | US | ausablefork | NY | {json data}
City Name Lookup
• Things to Know
– Row width: 10 – 60K
– Remove whitespace, special characters, convert
search terms to lowercase
– Only search when 2 or more characters have
been entered
Postal Code Range Search
• Use Case
– Find nearby neighborhoods
• Solution
– Create a lookup table
– RK: country|postal code
Postal Code Range Search
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance
(location_country text, zip_code text, distance_e2 int,
location text,
PRIMARY KEY ((location_country, zip_code),
distance_e2));
Postal Code Range Search
• Add Data
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 0, '{json data for 78741}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 180, '{json data for 78702}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 220, '{json data for 78721}');
Postal Code Range Search
• Search
cqlsh>SELECT * FROM zip_code_distance
WHERE location_country = 'US' AND zip_code = '78741'
AND distance_e2 < 200
ORDER BY distance_e2;
• Results
location_country | zip_code | distance_e2 | location
------------------+----------+-------------+-----------------------
US | 78741 | 0 | {json data for 78741}
US | 78741 | 180 | {json data for 78702}
Postal Code Range Search
• Things to know
– Row width: 1 to ~45K
Distance Between Postal Codes
• Use Case
– Estimate the distance between postal
codes
• Solution
– Create a lookup table
– RK: country|postal code
– CN: country|postal code
– Value: distanceE2
Distance Between Postal Codes
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance_between
(location_country_1 text, zip_code_1 text,
location_country_2 text, zip_code_2 text, distance_e2 int,
PRIMARY KEY ((location_country_1, zip_code_1),
location_country_2, zip_code_2));
Distance Between Postal Codes
• Add Data
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78741', 0);
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78702', 180);
Distance Between Postal Codes
• Select
cqlsh>SELECT * FROM zip_code_distance_between
WHERE location_country_1 = 'US'
AND zip_code_1 = '78741'
AND location_country_2 = 'US'
AND zip_code_2 = '78702';
• Results
location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2
--------------------+------------+--------------------+------------+-------------
US | 78741 | US | 78702 | 180
Distance Between Postal Codes
• Things to know
– Row width: ~45K
Final Thoughts
• Why just Cassandra?
– Fewer technologies to support
• Operations
• Development
– But be reasonable
• Prebuild reference data
– Consider prebuilding data to reduce read time
Questions & Contact Info
Matt Vorst
CTO Physi, Inc.
matt@physi.rocks

More Related Content

What's hot

GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovDatabricks
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkMartin Goodson
 
Handling Real-time Geostreams
Handling Real-time GeostreamsHandling Real-time Geostreams
Handling Real-time Geostreamsguest35660bc
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioRandall Hunt
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backVictor_Cr
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing EconomyMining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing Economytnoulas
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013 Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013 Daichi Morifuji
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word countJeff Patti
 

What's hot (19)

GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
 
Handling Real-time Geostreams
Handling Real-time GeostreamsHandling Real-time Geostreams
Handling Real-time Geostreams
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Dun ddd
Dun dddDun ddd
Dun ddd
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes back
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing EconomyMining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013 Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 

Similar to Geolocation and Cassandra at Physi

N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillDataWorks Summit
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensCitus Data
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Citus Data
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentationMurat Çakal
 
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Big Data Spain
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsEDB
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014Dave Stokes
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Keshav Murthy
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondIke Walker
 
SplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunk
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraJesus Guzman
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWAnkur Raina
 

Similar to Geolocation and Cassandra at Physi (20)

Presentation
PresentationPresentation
Presentation
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
 
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data Models
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and Beyond
 
SplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCS
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 

Recently uploaded

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Geolocation and Cassandra at Physi

  • 1. Geolocation with Cassandra Austin Cassandra Users – Jan 21, 2016
  • 2. Matt Vorst • Cassandra User – Since 2011 • Architect / Java developer • Corporate Life – EntekIRD & Rockwell Automation • Serial Entrepreneur – EventsInCincinnati.com – Co-founder – Dotloop, Inc. – Co-founder and CTO – Physi, Inc. – Co-founder and C*O
  • 3. Physi [fiz-ee] (noun) 1. a mobile app that pairs nearby people to play sports 2. a movement to make a smaller, happier, healthier world through play
  • 4. Why Cassandra • Operations is Hard – Most relational DB’s don’t scale easily or well – Murphy’s Law always strikes at the worst time – Recovery shouldn’t come at a high cost • Distributed Design – Cassandra is a distributed technology – Applications are designed to be distributed
  • 5. Necessary Location Services • Proximity Search – Postal code range search – Distance between postal codes • Location Conversion – Postal code to latitude/longitude – Latitude/longitude to postal code • Search – City name lookup
  • 6. Setup • Create the Keyspace cqlsh> CREATE KEYSPACE physi WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE physi;
  • 7. Postal Code to Latitude/Longitude • Use Case – Place markers on a map • Solution – Buy a database – PK: Country/postal code
  • 8. Postal Code to Latitude/Longitude • Create Column Family cqlsh>CREATE TABLE zip_code_master ( location_country text, zip_code text, location_uuid uuid, location_type text, city text, county text, state text, latitude_e6 bigint, longitude_e6 bigint, PRIMARY KEY (location_country, zip_code));
  • 9. Postal Code to Latitude/Longitude • Add data cqlsh>INSERT INTO zip_code_master (location_country, zip_code, location_uuid, location_type, city, county, state, latitude_e6, longitude_e6) VALUES(‘US’,’45219’, 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39, ’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’, 39127564,-84514489);
  • 10. Postal Code to Latitude/Longitude • Search cqlsh>SELECT * FROM zip_code_master WHERE location_country = 'US' AND zip_code = '45219'; location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state ------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------ US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH • Results
  • 11. Postal Code to Latitude/Longitude • Things to Know – Row width: ~10 – Postal codes cover different areas – A single postal codes can span different cities, counties, and even states – The largest postal code covers 10,000 mi2
  • 12. Latitude/Longitude to Postal Code • Use Case – Determine which postal code a user is currently in server side – Use this to return suggestions
  • 13. Latitude/Longitude to Postal Code • The Relational Way – Draw a box, loop, and calculate – Query: SELECT * FROM location_table WHERE (min lat) < latitude AND latitude < (max lat) AND (min long) < longitude AND longitude < (max long)
  • 14. Latitude/Longitude to Postal Code • Cassandra Solution – Prebuild a lookup table • Slice the US up into 7mi by <=7mi squares • ~69 miles between lines of latitude • Longitude is not equally spaced – PK: latE1|longE1
  • 15. Latitude/Longitude to Postal Code • Cassandra Solution (cont.) – Build: Add bordering postal codes – Read: Loop and calculate distance
  • 16. Latitude/Longitude to Postal Code • Create Column Family cqlsh>CREATE TABLE latitude_longitude_zip_code (latitude_e1 int, longitude_e1 int, location_country text, zip_code text, location text, PRIMARY KEY ((latitude_e1, longitude_e1), location_country, zip_code));
  • 17. Latitude/Longitude to Postal Code • Add data cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45219','{json data}'); cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45220','{json data}');
  • 18. Latitude/Longitude to Postal Code • Search cqlsh>SELECT * FROM latitude_longitude_zip_code WHERE latitude_e1 = 391 AND longitude_e1 = -845; • Results latitude_e1 | longitude_e1 | location_country | zip_code | location -------------+--------------+------------------+----------+------------- 391 | -845 | US | 45206 | {json data} 391 | -845 | US | 45219 | {json data} 391 | -845 | US | 45220 | {json data}
  • 19. Latitude/Longitude to Postal Code • Things to Know – Row width: 1 to ~50 – This was a short lived solution – Primarily using client location services – Still used as a fallback for web – Creation of the lookup table took 3 hours on localhost with RAID 0 SSDs
  • 20. City Name Lookup • Use Case – Auto-complete city name • Solution – Create a lookup – RK: searchTerm – CN: (0 padded count)|country|city
  • 21. City Name Lookup • Create Column Family cqlsh>CREATE TABLE name_search (search_term text, occurrence_count int, location_country text, city text, state text, location text, PRIMARY KEY ((search_term), occurrence_count, location_country, city, state));
  • 22. City Name Lookup • Add data cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location) VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}'); cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location) VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
  • 23. City Name Lookup • Search cqlsh>SELECT * FROM name_search WHERE search_term = 'aus' ORDER BY occurrence_count DESC; • Results search_term | occurrence_count | location_country | city | state | location -------------+------------------+------------------+-------------+-------+------------- aus | 31 | US | austin | TX | {json data} aus | 10 | US | austell | GA | {json data} aus | 10 | US | ausablefork | NY | {json data}
  • 24. City Name Lookup • Things to Know – Row width: 10 – 60K – Remove whitespace, special characters, convert search terms to lowercase – Only search when 2 or more characters have been entered
  • 25. Postal Code Range Search • Use Case – Find nearby neighborhoods • Solution – Create a lookup table – RK: country|postal code
  • 26. Postal Code Range Search • Create Column Family cqlsh>CREATE TABLE zip_code_distance (location_country text, zip_code text, distance_e2 int, location text, PRIMARY KEY ((location_country, zip_code), distance_e2));
  • 27. Postal Code Range Search • Add Data cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 0, '{json data for 78741}'); cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 180, '{json data for 78702}'); cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 220, '{json data for 78721}');
  • 28. Postal Code Range Search • Search cqlsh>SELECT * FROM zip_code_distance WHERE location_country = 'US' AND zip_code = '78741' AND distance_e2 < 200 ORDER BY distance_e2; • Results location_country | zip_code | distance_e2 | location ------------------+----------+-------------+----------------------- US | 78741 | 0 | {json data for 78741} US | 78741 | 180 | {json data for 78702}
  • 29. Postal Code Range Search • Things to know – Row width: 1 to ~45K
  • 30. Distance Between Postal Codes • Use Case – Estimate the distance between postal codes • Solution – Create a lookup table – RK: country|postal code – CN: country|postal code – Value: distanceE2
  • 31. Distance Between Postal Codes • Create Column Family cqlsh>CREATE TABLE zip_code_distance_between (location_country_1 text, zip_code_1 text, location_country_2 text, zip_code_2 text, distance_e2 int, PRIMARY KEY ((location_country_1, zip_code_1), location_country_2, zip_code_2));
  • 32. Distance Between Postal Codes • Add Data cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2) VALUES('US', '78741', 'US', '78741', 0); cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2) VALUES('US', '78741', 'US', '78702', 180);
  • 33. Distance Between Postal Codes • Select cqlsh>SELECT * FROM zip_code_distance_between WHERE location_country_1 = 'US' AND zip_code_1 = '78741' AND location_country_2 = 'US' AND zip_code_2 = '78702'; • Results location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2 --------------------+------------+--------------------+------------+------------- US | 78741 | US | 78702 | 180
  • 34. Distance Between Postal Codes • Things to know – Row width: ~45K
  • 35. Final Thoughts • Why just Cassandra? – Fewer technologies to support • Operations • Development – But be reasonable • Prebuild reference data – Consider prebuilding data to reduce read time
  • 36. Questions & Contact Info Matt Vorst CTO Physi, Inc. matt@physi.rocks