2. Matt Vorst
• Cassandra User
– Since 2011
• Architect / Java developer
• Corporate Life
– EntekIRD & Rockwell Automation
• Serial Entrepreneur
– EventsInCincinnati.com – Co-founder
– Dotloop, Inc. – Co-founder and CTO
– Physi, Inc. – Co-founder and C*O
3. Physi [fiz-ee] (noun)
1. a mobile app that pairs nearby people to play sports
2. a movement to make a smaller, happier, healthier
world through play
4. Why Cassandra
• Operations is Hard
– Most relational DB’s don’t scale easily or well
– Murphy’s Law always strikes at the worst time
– Recovery shouldn’t come at a high cost
• Distributed Design
– Cassandra is a distributed technology
– Applications are designed to be distributed
5. Necessary Location Services
• Proximity Search
– Postal code range search
– Distance between postal codes
• Location Conversion
– Postal code to latitude/longitude
– Latitude/longitude to postal code
• Search
– City name lookup
6. Setup
• Create the Keyspace
cqlsh> CREATE KEYSPACE physi WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE physi;
7. Postal Code to Latitude/Longitude
• Use Case
– Place markers on a map
• Solution
– Buy a database
– PK: Country/postal code
8. Postal Code to Latitude/Longitude
• Create Column Family
cqlsh>CREATE TABLE zip_code_master (
location_country text, zip_code text, location_uuid uuid,
location_type text, city text, county text, state text,
latitude_e6 bigint, longitude_e6 bigint,
PRIMARY KEY (location_country, zip_code));
9. Postal Code to Latitude/Longitude
• Add data
cqlsh>INSERT INTO zip_code_master
(location_country, zip_code, location_uuid, location_type,
city, county, state, latitude_e6, longitude_e6)
VALUES(‘US’,’45219’,
7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39,
’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’,
39127564,-84514489);
10. Postal Code to Latitude/Longitude
• Search
cqlsh>SELECT * FROM zip_code_master WHERE
location_country = 'US' AND zip_code = '45219';
location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state
------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------
US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH
• Results
11. Postal Code to Latitude/Longitude
• Things to Know
– Row width: ~10
– Postal codes cover different areas
– A single postal codes can span different cities,
counties, and even states
– The largest postal code covers 10,000 mi2
12. Latitude/Longitude to Postal Code
• Use Case
– Determine which postal code a
user is currently in server side
– Use this to return suggestions
13. Latitude/Longitude to Postal Code
• The Relational Way
– Draw a box, loop, and calculate
– Query:
SELECT * FROM location_table
WHERE (min lat) < latitude AND latitude < (max lat)
AND (min long) < longitude AND longitude < (max long)
14. Latitude/Longitude to Postal Code
• Cassandra Solution
– Prebuild a lookup table
• Slice the US up into 7mi by <=7mi squares
• ~69 miles between lines of latitude
• Longitude is not equally spaced
– PK: latE1|longE1
15. Latitude/Longitude to Postal Code
• Cassandra Solution (cont.)
– Build: Add bordering postal codes
– Read: Loop and calculate distance
16. Latitude/Longitude to Postal Code
• Create Column Family
cqlsh>CREATE TABLE latitude_longitude_zip_code
(latitude_e1 int, longitude_e1 int, location_country text,
zip_code text, location text,
PRIMARY KEY ((latitude_e1, longitude_e1),
location_country, zip_code));
17. Latitude/Longitude to Postal Code
• Add data
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45219','{json data}');
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45220','{json data}');
18. Latitude/Longitude to Postal Code
• Search
cqlsh>SELECT * FROM latitude_longitude_zip_code
WHERE latitude_e1 = 391 AND longitude_e1 = -845;
• Results
latitude_e1 | longitude_e1 | location_country | zip_code | location
-------------+--------------+------------------+----------+-------------
391 | -845 | US | 45206 | {json data}
391 | -845 | US | 45219 | {json data}
391 | -845 | US | 45220 | {json data}
19. Latitude/Longitude to Postal Code
• Things to Know
– Row width: 1 to ~50
– This was a short lived solution
– Primarily using client location services
– Still used as a fallback for web
– Creation of the lookup table took 3 hours on
localhost with RAID 0 SSDs
20. City Name Lookup
• Use Case
– Auto-complete city name
• Solution
– Create a lookup
– RK: searchTerm
– CN: (0 padded count)|country|city
21. City Name Lookup
• Create Column Family
cqlsh>CREATE TABLE name_search
(search_term text, occurrence_count int,
location_country text, city text, state text, location text,
PRIMARY KEY ((search_term), occurrence_count,
location_country, city, state));
22. City Name Lookup
• Add data
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}');
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
23. City Name Lookup
• Search
cqlsh>SELECT * FROM name_search
WHERE search_term = 'aus'
ORDER BY occurrence_count DESC;
• Results
search_term | occurrence_count | location_country | city | state | location
-------------+------------------+------------------+-------------+-------+-------------
aus | 31 | US | austin | TX | {json data}
aus | 10 | US | austell | GA | {json data}
aus | 10 | US | ausablefork | NY | {json data}
24. City Name Lookup
• Things to Know
– Row width: 10 – 60K
– Remove whitespace, special characters, convert
search terms to lowercase
– Only search when 2 or more characters have
been entered
25. Postal Code Range Search
• Use Case
– Find nearby neighborhoods
• Solution
– Create a lookup table
– RK: country|postal code
26. Postal Code Range Search
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance
(location_country text, zip_code text, distance_e2 int,
location text,
PRIMARY KEY ((location_country, zip_code),
distance_e2));
27. Postal Code Range Search
• Add Data
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 0, '{json data for 78741}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 180, '{json data for 78702}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 220, '{json data for 78721}');
28. Postal Code Range Search
• Search
cqlsh>SELECT * FROM zip_code_distance
WHERE location_country = 'US' AND zip_code = '78741'
AND distance_e2 < 200
ORDER BY distance_e2;
• Results
location_country | zip_code | distance_e2 | location
------------------+----------+-------------+-----------------------
US | 78741 | 0 | {json data for 78741}
US | 78741 | 180 | {json data for 78702}
30. Distance Between Postal Codes
• Use Case
– Estimate the distance between postal
codes
• Solution
– Create a lookup table
– RK: country|postal code
– CN: country|postal code
– Value: distanceE2
31. Distance Between Postal Codes
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance_between
(location_country_1 text, zip_code_1 text,
location_country_2 text, zip_code_2 text, distance_e2 int,
PRIMARY KEY ((location_country_1, zip_code_1),
location_country_2, zip_code_2));
32. Distance Between Postal Codes
• Add Data
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78741', 0);
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78702', 180);
33. Distance Between Postal Codes
• Select
cqlsh>SELECT * FROM zip_code_distance_between
WHERE location_country_1 = 'US'
AND zip_code_1 = '78741'
AND location_country_2 = 'US'
AND zip_code_2 = '78702';
• Results
location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2
--------------------+------------+--------------------+------------+-------------
US | 78741 | US | 78702 | 180
35. Final Thoughts
• Why just Cassandra?
– Fewer technologies to support
• Operations
• Development
– But be reasonable
• Prebuild reference data
– Consider prebuilding data to reduce read time