Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop


Published on

Speaker: Dmitry Mezhensky

Published in: Technology, Economy & Finance
  • Be the first to comment

  • Be the first to like this

C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop

  1. 1. Analytics on top of Cassandra and Hadoop Dmitry Mezhensky | Mirantis Inc #CASSANDRAEU
  2. 2. What we will discuss today ● Analytics on Cassandra using Hadoop ● Various types of statistics & implementation ● Scalability of approach #CASSANDRAEU
  3. 3. Problems ● Too many statistics (more that 100) ● Various types ○ Top N ○ Time series ○ Min/max/average/median ○ Extremum values on time interval ○ Fraud analysis ● Huge amount of data ● Scalability of approach #CASSANDRAEU
  4. 4. Statistics implementation on Hadoop #CASSANDRAEU
  5. 5. Top N ● Map phase generates <Key, Value> pairs, top N is building by Value ● Reduce phase accumulates values, persist to Cassandra is done via custom output format ● For top N entities in Cassandra suitable comparator was used #CASSANDRAEU
  6. 6. Top N ● One write stage to Cassandra sorting is done by value ● On reading stage first N records will be Top N values #CASSANDRAEU
  7. 7. Time series ● Map phase generates pairs <Time, Value> ● Reduce phase accumulates (various behaviour for different statistics) ● Persist to Cassandra using custom output format & using one row key per statistics, one column per date #CASSANDRAEU
  8. 8. Maximum, minimum, extremum on interval ● Max/min values are simple to calculate ● Extremum on interval is calculating the similar to time series #CASSANDRAEU
  9. 9. Fraud analysis ● Fraud analysis is running after all statistics are calculated ● Processed data is filtered by fraud filters #CASSANDRAEU
  10. 10. Scalability approach ● ● ● ● Data is reading/writing to Cassandra only Hadoop is elastically scalable Cassandra is elastically scalable No bottleneck #CASSANDRAEU
  11. 11. Questions? #CASSANDRAEU
  12. 12. Thank you! #CASSANDRAEU