Apache Cassandra makes it possible to write code on a laptop and deploy to multi-region clusters with a few configuration changes. But what does it take to create repeatable, scalable, reliable, and observable clusters?
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the tools and techniques they use. From environment planning to implementation for tools such as Chef, Sensu, Graphite, Riemann and LogStash this will be a discussion of the full stack ecosystem for successful projects.
Design and Development of a Provenance Capture Platform for Data Science
The Last Pickle: Repeatable, Scalable, Reliable, Observable: Cassandra
1. CASSANDRA SF 2015
REPEATABLE, SCALABLE, RELIABLE,
OBSERVABLE CASSANDRA
Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
2. AboutThe Last Pickle.
Work with clients to deliver and improve Apache Cassandra
based solutions.
Apache Cassandra Committer, DataStax MVP, Apache
Usergrid Committer.
Based in New Zealand,Australia, & USA.
5. No Look Writes
CREATE TABLE user_visits (
user text,
day int, // YYYYMMDD
PRIMARY KEY (user, day)
);
6. No Look Writes
// Bad
SELECT *
FROM user_visits
WHERE user = ‘aaron’ AND day = 20150924;
INSERT INTO user_visits (user, day)
VALUES ('aaron', 20150924);
7. No Look Writes
// Better
INSERT INTO user_visits (user, day)
VALUES ('aaron', 20150924);
INSERT INTO user_visits (user, day)
VALUES ('aaron', 20150924);
21. Concurrent Asynchronous Requests
// request for cities concurrently
SELECT *
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'Santa Clara';
SELECT *
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'San Jose';
24. Data Model SmokeTest
/*
* Get Pricing Data
*/
// Load Data
INSERT INTO city_distances (city, distance, nearby_city)
VALUES ('Santa Clara', 0, 'Santa Clara');
INSERT INTO city_distances (city, distance, nearby_city)
VALUES ('Santa Clara', 1, 'San Jose');
INSERT INTO hotel_price (checkin_day, city, hotel_name, price_data)
VALUES (20150924, 'Santa Clara', 'Hilton Santa Clara', 0xFF);
INSERT INTO hotel_price (checkin_day, city, hotel_name, price_data)
VALUES (20150924, 'San Jose', 'Hyatt San Jose', 0xFF);
25. Data Model SmokeTest
// Step 1
// Get the near by cities for the one selected by the user
SELECT nearby_city
FROM city_distances
WHERE city = 'Santa Clara' and distance < 2;
// Step 2
// Parallel requests for each city returned.
SELECT city, hotel_name, price_data
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'Santa Clara';
SELECT city, hotel_name, price_data
FROM hotel_price
WHERE checkin_day = 20150924 AND city = 'San Jose';
55. Disk SmokeTests
“Disk Latency and Other
Random Numbers”
Al Toby
http://tobert.github.io/post/2014-11-13-slides-disk-
latency-and-other-random-numbers.html