SlideShare a Scribd company logo
1 of 36
Geolocation with Cassandra
Austin Cassandra Users – Jan 21, 2016
Matt Vorst
• Cassandra User
– Since 2011
• Architect / Java developer
• Corporate Life
– EntekIRD & Rockwell Automation
• Serial Entrepreneur
– EventsInCincinnati.com – Co-founder
– Dotloop, Inc. – Co-founder and CTO
– Physi, Inc. – Co-founder and C*O
Physi [fiz-ee] (noun)
1. a mobile app that pairs nearby people to play sports
2. a movement to make a smaller, happier, healthier
world through play
Why Cassandra
• Operations is Hard
– Most relational DB’s don’t scale easily or well
– Murphy’s Law always strikes at the worst time
– Recovery shouldn’t come at a high cost
• Distributed Design
– Cassandra is a distributed technology
– Applications are designed to be distributed
Necessary Location Services
• Proximity Search
– Postal code range search
– Distance between postal codes
• Location Conversion
– Postal code to latitude/longitude
– Latitude/longitude to postal code
• Search
– City name lookup
Setup
• Create the Keyspace
cqlsh> CREATE KEYSPACE physi WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE physi;
Postal Code to Latitude/Longitude
• Use Case
– Place markers on a map
• Solution
– Buy a database
– PK: Country/postal code
Postal Code to Latitude/Longitude
• Create Column Family
cqlsh>CREATE TABLE zip_code_master (
location_country text, zip_code text, location_uuid uuid,
location_type text, city text, county text, state text,
latitude_e6 bigint, longitude_e6 bigint,
PRIMARY KEY (location_country, zip_code));
Postal Code to Latitude/Longitude
• Add data
cqlsh>INSERT INTO zip_code_master
(location_country, zip_code, location_uuid, location_type,
city, county, state, latitude_e6, longitude_e6)
VALUES(‘US’,’45219’,
7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39,
’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’,
39127564,-84514489);
Postal Code to Latitude/Longitude
• Search
cqlsh>SELECT * FROM zip_code_master WHERE
location_country = 'US' AND zip_code = '45219';
location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state
------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------
US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH
• Results
Postal Code to Latitude/Longitude
• Things to Know
– Row width: ~10
– Postal codes cover different areas
– A single postal codes can span different cities,
counties, and even states
– The largest postal code covers 10,000 mi2
Latitude/Longitude to Postal Code
• Use Case
– Determine which postal code a
user is currently in server side
– Use this to return suggestions
Latitude/Longitude to Postal Code
• The Relational Way
– Draw a box, loop, and calculate
– Query:
SELECT * FROM location_table
WHERE (min lat) < latitude AND latitude < (max lat)
AND (min long) < longitude AND longitude < (max long)
Latitude/Longitude to Postal Code
• Cassandra Solution
– Prebuild a lookup table
• Slice the US up into 7mi by <=7mi squares
• ~69 miles between lines of latitude
• Longitude is not equally spaced
– PK: latE1|longE1
Latitude/Longitude to Postal Code
• Cassandra Solution (cont.)
– Build: Add bordering postal codes
– Read: Loop and calculate distance
Latitude/Longitude to Postal Code
• Create Column Family
cqlsh>CREATE TABLE latitude_longitude_zip_code
(latitude_e1 int, longitude_e1 int, location_country text,
zip_code text, location text,
PRIMARY KEY ((latitude_e1, longitude_e1),
location_country, zip_code));
Latitude/Longitude to Postal Code
• Add data
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45219','{json data}');
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45220','{json data}');
Latitude/Longitude to Postal Code
• Search
cqlsh>SELECT * FROM latitude_longitude_zip_code
WHERE latitude_e1 = 391 AND longitude_e1 = -845;
• Results
latitude_e1 | longitude_e1 | location_country | zip_code | location
-------------+--------------+------------------+----------+-------------
391 | -845 | US | 45206 | {json data}
391 | -845 | US | 45219 | {json data}
391 | -845 | US | 45220 | {json data}
Latitude/Longitude to Postal Code
• Things to Know
– Row width: 1 to ~50
– This was a short lived solution
– Primarily using client location services
– Still used as a fallback for web
– Creation of the lookup table took 3 hours on
localhost with RAID 0 SSDs
City Name Lookup
• Use Case
– Auto-complete city name
• Solution
– Create a lookup
– RK: searchTerm
– CN: (0 padded count)|country|city
City Name Lookup
• Create Column Family
cqlsh>CREATE TABLE name_search
(search_term text, occurrence_count int,
location_country text, city text, state text, location text,
PRIMARY KEY ((search_term), occurrence_count,
location_country, city, state));
City Name Lookup
• Add data
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}');
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
City Name Lookup
• Search
cqlsh>SELECT * FROM name_search
WHERE search_term = 'aus'
ORDER BY occurrence_count DESC;
• Results
search_term | occurrence_count | location_country | city | state | location
-------------+------------------+------------------+-------------+-------+-------------
aus | 31 | US | austin | TX | {json data}
aus | 10 | US | austell | GA | {json data}
aus | 10 | US | ausablefork | NY | {json data}
City Name Lookup
• Things to Know
– Row width: 10 – 60K
– Remove whitespace, special characters, convert
search terms to lowercase
– Only search when 2 or more characters have
been entered
Postal Code Range Search
• Use Case
– Find nearby neighborhoods
• Solution
– Create a lookup table
– RK: country|postal code
Postal Code Range Search
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance
(location_country text, zip_code text, distance_e2 int,
location text,
PRIMARY KEY ((location_country, zip_code),
distance_e2));
Postal Code Range Search
• Add Data
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 0, '{json data for 78741}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 180, '{json data for 78702}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 220, '{json data for 78721}');
Postal Code Range Search
• Search
cqlsh>SELECT * FROM zip_code_distance
WHERE location_country = 'US' AND zip_code = '78741'
AND distance_e2 < 200
ORDER BY distance_e2;
• Results
location_country | zip_code | distance_e2 | location
------------------+----------+-------------+-----------------------
US | 78741 | 0 | {json data for 78741}
US | 78741 | 180 | {json data for 78702}
Postal Code Range Search
• Things to know
– Row width: 1 to ~45K
Distance Between Postal Codes
• Use Case
– Estimate the distance between postal
codes
• Solution
– Create a lookup table
– RK: country|postal code
– CN: country|postal code
– Value: distanceE2
Distance Between Postal Codes
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance_between
(location_country_1 text, zip_code_1 text,
location_country_2 text, zip_code_2 text, distance_e2 int,
PRIMARY KEY ((location_country_1, zip_code_1),
location_country_2, zip_code_2));
Distance Between Postal Codes
• Add Data
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78741', 0);
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78702', 180);
Distance Between Postal Codes
• Select
cqlsh>SELECT * FROM zip_code_distance_between
WHERE location_country_1 = 'US'
AND zip_code_1 = '78741'
AND location_country_2 = 'US'
AND zip_code_2 = '78702';
• Results
location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2
--------------------+------------+--------------------+------------+-------------
US | 78741 | US | 78702 | 180
Distance Between Postal Codes
• Things to know
– Row width: ~45K
Final Thoughts
• Why just Cassandra?
– Fewer technologies to support
• Operations
• Development
– But be reasonable
• Prebuild reference data
– Consider prebuilding data to reduce read time
Questions & Contact Info
Matt Vorst
CTO Physi, Inc.
matt@physi.rocks

More Related Content

What's hot

GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovDatabricks
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkMartin Goodson
 
Handling Real-time Geostreams
Handling Real-time GeostreamsHandling Real-time Geostreams
Handling Real-time Geostreamsguest35660bc
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioRandall Hunt
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backVictor_Cr
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing EconomyMining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing Economytnoulas
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013 Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013 Daichi Morifuji
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word countJeff Patti
 

What's hot (19)

GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
 
Handling Real-time Geostreams
Handling Real-time GeostreamsHandling Real-time Geostreams
Handling Real-time Geostreams
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Dun ddd
Dun dddDun ddd
Dun ddd
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes back
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing EconomyMining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013 Time Series Analysis by JavaScript LL matsuri 2013
Time Series Analysis by JavaScript LL matsuri 2013
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 

Similar to Geolocation and Cassandra at Physi

N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillDataWorks Summit
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensCitus Data
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Citus Data
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentationMurat Çakal
 
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Big Data Spain
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsEDB
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014Dave Stokes
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Keshav Murthy
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondIke Walker
 
SplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunk
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraJesus Guzman
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWAnkur Raina
 

Similar to Geolocation and Cassandra at Physi (20)

Presentation
PresentationPresentation
Presentation
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
 
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data Models
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and Beyond
 
SplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCSSplunkLive! Dallas Nov 2012 - Metro PCS
SplunkLive! Dallas Nov 2012 - Metro PCS
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Geolocation and Cassandra at Physi

  • 1. Geolocation with Cassandra Austin Cassandra Users – Jan 21, 2016
  • 2. Matt Vorst • Cassandra User – Since 2011 • Architect / Java developer • Corporate Life – EntekIRD & Rockwell Automation • Serial Entrepreneur – EventsInCincinnati.com – Co-founder – Dotloop, Inc. – Co-founder and CTO – Physi, Inc. – Co-founder and C*O
  • 3. Physi [fiz-ee] (noun) 1. a mobile app that pairs nearby people to play sports 2. a movement to make a smaller, happier, healthier world through play
  • 4. Why Cassandra • Operations is Hard – Most relational DB’s don’t scale easily or well – Murphy’s Law always strikes at the worst time – Recovery shouldn’t come at a high cost • Distributed Design – Cassandra is a distributed technology – Applications are designed to be distributed
  • 5. Necessary Location Services • Proximity Search – Postal code range search – Distance between postal codes • Location Conversion – Postal code to latitude/longitude – Latitude/longitude to postal code • Search – City name lookup
  • 6. Setup • Create the Keyspace cqlsh> CREATE KEYSPACE physi WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE physi;
  • 7. Postal Code to Latitude/Longitude • Use Case – Place markers on a map • Solution – Buy a database – PK: Country/postal code
  • 8. Postal Code to Latitude/Longitude • Create Column Family cqlsh>CREATE TABLE zip_code_master ( location_country text, zip_code text, location_uuid uuid, location_type text, city text, county text, state text, latitude_e6 bigint, longitude_e6 bigint, PRIMARY KEY (location_country, zip_code));
  • 9. Postal Code to Latitude/Longitude • Add data cqlsh>INSERT INTO zip_code_master (location_country, zip_code, location_uuid, location_type, city, county, state, latitude_e6, longitude_e6) VALUES(‘US’,’45219’, 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39, ’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’, 39127564,-84514489);
  • 10. Postal Code to Latitude/Longitude • Search cqlsh>SELECT * FROM zip_code_master WHERE location_country = 'US' AND zip_code = '45219'; location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state ------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------ US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH • Results
  • 11. Postal Code to Latitude/Longitude • Things to Know – Row width: ~10 – Postal codes cover different areas – A single postal codes can span different cities, counties, and even states – The largest postal code covers 10,000 mi2
  • 12. Latitude/Longitude to Postal Code • Use Case – Determine which postal code a user is currently in server side – Use this to return suggestions
  • 13. Latitude/Longitude to Postal Code • The Relational Way – Draw a box, loop, and calculate – Query: SELECT * FROM location_table WHERE (min lat) < latitude AND latitude < (max lat) AND (min long) < longitude AND longitude < (max long)
  • 14. Latitude/Longitude to Postal Code • Cassandra Solution – Prebuild a lookup table • Slice the US up into 7mi by <=7mi squares • ~69 miles between lines of latitude • Longitude is not equally spaced – PK: latE1|longE1
  • 15. Latitude/Longitude to Postal Code • Cassandra Solution (cont.) – Build: Add bordering postal codes – Read: Loop and calculate distance
  • 16. Latitude/Longitude to Postal Code • Create Column Family cqlsh>CREATE TABLE latitude_longitude_zip_code (latitude_e1 int, longitude_e1 int, location_country text, zip_code text, location text, PRIMARY KEY ((latitude_e1, longitude_e1), location_country, zip_code));
  • 17. Latitude/Longitude to Postal Code • Add data cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45219','{json data}'); cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45220','{json data}');
  • 18. Latitude/Longitude to Postal Code • Search cqlsh>SELECT * FROM latitude_longitude_zip_code WHERE latitude_e1 = 391 AND longitude_e1 = -845; • Results latitude_e1 | longitude_e1 | location_country | zip_code | location -------------+--------------+------------------+----------+------------- 391 | -845 | US | 45206 | {json data} 391 | -845 | US | 45219 | {json data} 391 | -845 | US | 45220 | {json data}
  • 19. Latitude/Longitude to Postal Code • Things to Know – Row width: 1 to ~50 – This was a short lived solution – Primarily using client location services – Still used as a fallback for web – Creation of the lookup table took 3 hours on localhost with RAID 0 SSDs
  • 20. City Name Lookup • Use Case – Auto-complete city name • Solution – Create a lookup – RK: searchTerm – CN: (0 padded count)|country|city
  • 21. City Name Lookup • Create Column Family cqlsh>CREATE TABLE name_search (search_term text, occurrence_count int, location_country text, city text, state text, location text, PRIMARY KEY ((search_term), occurrence_count, location_country, city, state));
  • 22. City Name Lookup • Add data cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location) VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}'); cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location) VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
  • 23. City Name Lookup • Search cqlsh>SELECT * FROM name_search WHERE search_term = 'aus' ORDER BY occurrence_count DESC; • Results search_term | occurrence_count | location_country | city | state | location -------------+------------------+------------------+-------------+-------+------------- aus | 31 | US | austin | TX | {json data} aus | 10 | US | austell | GA | {json data} aus | 10 | US | ausablefork | NY | {json data}
  • 24. City Name Lookup • Things to Know – Row width: 10 – 60K – Remove whitespace, special characters, convert search terms to lowercase – Only search when 2 or more characters have been entered
  • 25. Postal Code Range Search • Use Case – Find nearby neighborhoods • Solution – Create a lookup table – RK: country|postal code
  • 26. Postal Code Range Search • Create Column Family cqlsh>CREATE TABLE zip_code_distance (location_country text, zip_code text, distance_e2 int, location text, PRIMARY KEY ((location_country, zip_code), distance_e2));
  • 27. Postal Code Range Search • Add Data cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 0, '{json data for 78741}'); cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 180, '{json data for 78702}'); cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 220, '{json data for 78721}');
  • 28. Postal Code Range Search • Search cqlsh>SELECT * FROM zip_code_distance WHERE location_country = 'US' AND zip_code = '78741' AND distance_e2 < 200 ORDER BY distance_e2; • Results location_country | zip_code | distance_e2 | location ------------------+----------+-------------+----------------------- US | 78741 | 0 | {json data for 78741} US | 78741 | 180 | {json data for 78702}
  • 29. Postal Code Range Search • Things to know – Row width: 1 to ~45K
  • 30. Distance Between Postal Codes • Use Case – Estimate the distance between postal codes • Solution – Create a lookup table – RK: country|postal code – CN: country|postal code – Value: distanceE2
  • 31. Distance Between Postal Codes • Create Column Family cqlsh>CREATE TABLE zip_code_distance_between (location_country_1 text, zip_code_1 text, location_country_2 text, zip_code_2 text, distance_e2 int, PRIMARY KEY ((location_country_1, zip_code_1), location_country_2, zip_code_2));
  • 32. Distance Between Postal Codes • Add Data cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2) VALUES('US', '78741', 'US', '78741', 0); cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2) VALUES('US', '78741', 'US', '78702', 180);
  • 33. Distance Between Postal Codes • Select cqlsh>SELECT * FROM zip_code_distance_between WHERE location_country_1 = 'US' AND zip_code_1 = '78741' AND location_country_2 = 'US' AND zip_code_2 = '78702'; • Results location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2 --------------------+------------+--------------------+------------+------------- US | 78741 | US | 78702 | 180
  • 34. Distance Between Postal Codes • Things to know – Row width: ~45K
  • 35. Final Thoughts • Why just Cassandra? – Fewer technologies to support • Operations • Development – But be reasonable • Prebuild reference data – Consider prebuilding data to reduce read time
  • 36. Questions & Contact Info Matt Vorst CTO Physi, Inc. matt@physi.rocks