TIME SERIES
AGGREGATES
USING CASSANDRA, KAIROSDB & ALCHEMY API
• Bio-Informatics Engineer
• Business Analyst
• Data Warehouse Specialist
• System Operations / DevOps
• Founder & Lead Te...
Quick Review…
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
Why Real-Time?
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
REMEMBER --- TWEET
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> CREATE TABLE example (
....
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> CREATE TABLE example (
....
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> CREATE TABLE example (
....
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
[default@test] list example;
-------...
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
[default@test] list example;
-------...
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> CREATE TABLE example (
....
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> CREATE TABLE example (
....
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> SELECT * FROM example;
p...
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
cqlsh:test> SELECT * FROM example;
p...
Keys in C*
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
1. First part of composite key [insi...
A bit of data modelling
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
USER ACTIVITY DATA MODE...
Data modelling 4 QUERIES
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
FIND A CAR IN A LOT
CR...
Data modelling 4 QUERIES
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
FIND A CAR IN A LOT
Data modelling 4 QUERIES
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
FIND A CAR IN A LOT
IN...
Data modelling 4 QUERIES
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
FIND A CAR IN A LOT
SE...
Enter KairosDB
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
[{
"name": "archive.file.tracked...
Sentiment Analysis NLP
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
Sentiment Analysis NLP
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
He loves me He loves me ...
AlchemyAPI
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
AlchemyAPI uses natural language
pro...
Prep Work…
KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
https://gist.github.com/vanjos/61697...
Upcoming SlideShare
Loading in...5
×

CCM AlchemyAPI and Real-time Aggregation

521

Published on

An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction.

Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys.

All written in Python!

Published in: Software, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
521
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CCM AlchemyAPI and Real-time Aggregation

  1. 1. TIME SERIES AGGREGATES USING CASSANDRA, KAIROSDB & ALCHEMY API
  2. 2. • Bio-Informatics Engineer • Business Analyst • Data Warehouse Specialist • System Operations / DevOps • Founder & Lead Technologist • Presenter, Speaker, Organizer • Founder / Do-Gooder • Data Engineer & Manager Who is Victor Anjos? KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good
  3. 3. Quick Review… KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good
  4. 4. Why Real-Time? KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good
  5. 5. REMEMBER --- TWEET KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good
  6. 6. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);
  7. 7. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int); cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3); cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6); cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);
  8. 8. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int); cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3); cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6); cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9); cqlsh:test> SELECT * FROM example; field1 | field2 | field3 --------+--------+-------- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9
  9. 9. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good [default@test] list example; ------------------- RowKey: 1 => (column=, value=, timestamp=1374546754299000) => (column=field2, value=00000002, timestamp=1374546754299000) => (column=field3, value=00000003, timestamp=1374546754299000) ------------------- RowKey: 4 => (column=, value=, timestamp=1374546757815000) => (column=field2, value=00000005, timestamp=1374546757815000) => (column=field3, value=00000006, timestamp=1374546757815000) ------------------- RowKey: 7 => (column=, value=, timestamp=1374546761055000) => (column=field2, value=00000008, timestamp=1374546761055000) => (column=field3, value=00000009, timestamp=1374546761055000)
  10. 10. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good [default@test] list example; ------------------- RowKey: 1 => (column=, value=, timestamp=1374546754299000) => (column=field2, value=00000002, timestamp=1374546754299000) => (column=field3, value=00000003, timestamp=1374546754299000) ------------------- RowKey: 4 => (column=, value=, timestamp=1374546757815000) => (column=field2, value=00000005, timestamp=1374546757815000) => (column=field3, value=00000006, timestamp=1374546757815000) ------------------- RowKey: 7 => (column=, value=, timestamp=1374546761055000) => (column=field2, value=00000008, timestamp=1374546761055000) => (column=field3, value=00000009, timestamp=1374546761055000)
  11. 11. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );
  12. 12. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... ); cqlsh:test> INSERT INTO example (partitionKey1, ... partitionKey2, clusterKey1, clusterKey2, ... normalField1, normalField2) VALUES ( ... 'partitionVal1', ... 'partitionVal2', ... 'clusterVal1', ... 'clusterVal2', ... 'normalVal1', ... 'normalVal2');
  13. 13. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2 ---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2
  14. 14. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2 ---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2 [default@test] list example; ------------------- RowKey: partitionVal1:partitionVal2 => (column=clusterVal1:clusterVal2:, value=, timestamp=1374630892473000) => (column=clusterVal1:clusterVal2:normalfield1, value=6e6f726d616c56616c31, timestamp=1374630892473000)
  15. 15. Keys in C* KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good 1. First part of composite key [inside the inner brackets] is called “Partition Key”, rest [no inside the inner brackets] are “Cluster Keys”. 2. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (“:” as separator) to form column names (cluster keys). Column values remain unchanged. 3. Cluster keys (other than partition keys) are ordered, and you cannot allowed search on random columns, you have to specify the entire cluster key and can run a range query on the final portion of it.
  16. 16. A bit of data modelling KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good USER ACTIVITY DATA MODEL CREATE TABLE user_activity ( … username varchar, … interaction_time timeuuid, … activity_code varchar, … detail varchar … PRIMARY KEY (username, interaction time) … ) CREATE TABLE user_activity_history ( … username varchar, … interaction_date varchar, … interaction_time timeuuid, … activity_code varchar, … detail varchar, … PRIMARY KEY ( ,interaction_time) … );
  17. 17. Data modelling 4 QUERIES KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good FIND A CAR IN A LOT CREATE TABLE car_location_index ( … make varchar, … model varchar, … colour varchar, … vehicle_id int, … lot_id, … PRIMARY KEY ((make,model,colour),vehicle_id) … );
  18. 18. Data modelling 4 QUERIES KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good FIND A CAR IN A LOT
  19. 19. Data modelling 4 QUERIES KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good FIND A CAR IN A LOT INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘Ford’,’Mustang’,’Blue’,1234,8675309) INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘Ford’,’Mustang’,’’,1234,8675309) INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘Ford’,’’,’Blue’,1234,8675309) INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘Ford’,’’,’’,1234,8675309) INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘’,’Mustang’,’Blue’,1234,8675309) INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘’,’Mustang’,’’,1234,8675309) INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id) VALUES (‘’,’’,’Blue’,1234,8675309)
  20. 20. Data modelling 4 QUERIES KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good FIND A CAR IN A LOT SELECT vehicle_id, lot_id FROM car_location_index WHERE make = ‘Ford’ AND model = ‘’ AND colour= ‘Blue’; vehicle_id | lot_id --------------+----------- 1234 | 8675309 SELECT vehicle_id, lot_id FROM car_location_index WHERE make = ‘’ AND model = ‘’ AND colour = ‘Blue’; vehicle_id | lot_id --------------+----------- 1234 | 8675309 8765 | 5551212
  21. 21. Enter KairosDB KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good [{ "name": "archive.file.tracked", "datapoints": [[1359788400000, 123], [1359788300000, 13.2], [1359788410000, 23.1]], "tags": { "host": "server1", "data_center": "DC1" } }, { "name": "archive.file.search", "timestamp": 999, "value": 321, "tags":{"host":"test"} }] http://localhost:8080/api/v1/datapoints http://localhost:8080/api/v1/datapoints/query
  22. 22. Sentiment Analysis NLP KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good
  23. 23. Sentiment Analysis NLP KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good He loves me He loves me not
  24. 24. AlchemyAPI KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good AlchemyAPI uses natural language processing technology and machine learning algorithms to extract semantic meta-data from content, such as information on people, places, companies, topics, facts, relationships, authors, and languages.
  25. 25. Prep Work… KEEP TWEETING @VictorFAnjos @viafoura @AlchemyAPI @Datastax @Data_for_Good https://gist.github.com/vanjos/6169734 https://code.google.com/p/kairosdb/wiki/GettingStarted https://dev.twitter.com & https://apps.twitter.com/ http://www.alchemyapi.com/api/register.html
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×