jstein.cassandra.nyc.2011

5,259 views

Published on

RealTime Analytics With Cassandra

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,259
On SlideShare
0
From Embeds
0
Number of Embeds
1,607
Actions
Shares
0
Downloads
60
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

jstein.cassandra.nyc.2011

  1. 1. Cassandra as the central nervoussystem of your distributed systems /* Joe Stein http://www.linkedin.com/in/charmalloc @allthingshadoop @cassandranosql @allthingsscala @charmalloc */ http://www.medialets.com 1
  2. 2. Overview• Architecture• Aggregate Metrics/Time Series• Implementation Over Cassandra 2
  3. 3. MedialetsArchitecture 3
  4. 4. Medialets• Largest deployment of rich media ads for mobile devices• Over 300,000,000 devices supported• 3-4 TB of new data every day• Thousands of services in production• Hundreds of thousands ofevents received every second• Response times are measured in microseconds• Languages – 35% JVM (20% Scala& 10% Java) – 30% Ruby – 20% C/C++ – 13% Python – 2% Bash 4
  5. 5. The million foot viewAdServi Collecti ng on Kafka mysql Hadoop Cassandr mysql a Muse mysql
  6. 6. MedialetsAggregate Metrics/Time Series 6
  7. 7. Lets look at just one data point captured• 09/10/2011 11:12:13• App = Yahoo!• Platform = iOS• OS = 4.3.4• Device = iPad2,1• Resolution = 768x1024• Events –videoPlayPercent = 38 –Taste = great 7
  8. 8. The time series part of it• 09/10/2011 11:12:13 Quarter Q3 Month 201109 Week 201136 Day 20110910 Hour 2011091011 Minute 201109101112 Second 20110910111213 8
  9. 9. Metrics For Different WantsYahoo! + iOS + 4.3.4 + iPad2,1 + 768x1024Yahoo! + videoPlayPercent = 30 + Taste = greatYahoo! + Taste = greatYahoo! + videoPlayPercent = 30iPad2,1 + videoPlayPercent = 30 + Taste = great768x1024 + videoPlayPercent = 30 + Taste = greatiOS + 4.3.4 + iPad2,1 9
  10. 10. MedialetsImplementation Over Cassandra 10
  11. 11. Storing the time seriesCREATE COLUMN FAMILY ByDay Column Families hold yourWITH default_validation_class=CounterColumnType rows of data. Each row inAND key_validation_class=UTF8Type AND comparator=UTF8Type; each column family will be equal to the time period youCREATE COLUMN FAMILY ByHour are dealing with. So anWITH default_validation_class=CounterColumnType “event” occurring atAND key_validation_class=UTF8Type AND comparator=UTF8Type; 09/10/2011 12:13:14 will become 4 rowsCREATE COLUMN FAMILY ByMinuteWITH default_validation_class=CounterColumnType BySecond = 20110910121314AND key_validation_class=UTF8Type AND comparator=UTF8Type; ByMinute= 201109101213 ByHour= 2011091012CREATE COLUMN FAMILY BySecond ByDay=20110910WITH default_validation_class=CounterColumnTypeAND key_validation_class=UTF8Type AND comparator=UTF8Type; 11
  12. 12. Why multiple column families?http://www.datastax.com/docs/1.0/configuration/storage_configuration 12
  13. 13. Generically group by• app+platform+osversion+device+resolution• app+event1+event2• app+event1• app+event2• device+event1+event2• resolution+event1+event2• platform+osversion+device 13
  14. 14. As columns – names are composites• app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024• app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great• app+event1#Yahoo!+Taste=great• app+event2#Yahoo!+videoPlayPercent=30• device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great• resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great• platform+osversion+device#iOS+4.3.4+iPad2,1 14
  15. 15. The rows• ByHour=2011091011 – app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024 – app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great – app+event1#Yahoo!+Taste=great – app+event2#Yahoo!+videoPlayPercent=30 – device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great – resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great – platform+osversion+device#iOS+4.3.4+iPad2,1• ByDay=20110910 – app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iPad2,1+768x1024 – app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great – app+event1#Yahoo!+Taste=great – app+event2#Yahoo!+videoPlayPercent=30 – device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=great – resolution+event1+event2#768x1024+videoPlayPercent=30+Taste=great – platform+osversion+device#iOS+4.3.4+iPad2,1 15
  16. 16. Inserting data with Hector• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+platform+osversion+device+resolution#Yahoo!+iOS+4.3.4+iP ad2,1+768x1024”), 1))• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+event1+event2#Yahoo!+videoPlayPercent=30+Taste=great”) , 1))• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+event1#Yahoo!+Taste=great”), 1))• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“app+event2#Yahoo!+videoPlayPercent=30”), 1))• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“device+event1+event2#iPad2,1+videoPlayPercent=30+Taste=gre at”), 1))• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“resolution+event1+event2#768x1024+videoPlayPercent=30+Tast e=great”), 1))• mutator.insertCounter(“20110910, “ByDay”, HFactory.createCounterColumn(“platform+osversion+device#iOS+4.3.4+iPad2,1 16
  17. 17. Inserting data with Skeletor Skeletor is the Scala wrapper of Hector for Cassandra https://github.com/joestein/skeletoraggregateColumnNames(”AppPlatformOSVersionDeviceResolution") = "app+platform+osversion+device+resolution#”def ccAppPlatformOSVersionDeviceResolution(c: (String) => Unit) = {c(aggregateColumnNames(”AppPlatformOSVersionDeviceResolution”) + app + p(platform) + p(osversion) + p(device) + p(resolution))}//rows we are going to write tooaggregateKeys(KEYSPACE ”ByMonth") = month //201109aggregateKeys(KEYSPACE "ByDay") = day //20110910aggregateKeys(KEYSPACE ”ByHour") = hour //2011091012aggregateKeys(KEYSPACE ”ByMinute") = minute //201109101213def r(columnName: String): Unit = {aggregateKeys.foreach{tuple:(ColumnFamily, String) => {val (columnFamily,row) = tuple if (row !=null &&row.size> 0) rows add (columnFamily -> row has columnName inc) //increment the counter } }}ccAppPlatformOSVersionDeviceResolution(r) 17
  18. 18. Retrieving Data MultigetSliceCounterQuery• setColumnFamily(“ByDay”)• setKeys("20110910")• setRange(”app+event1=","app+event1=~",false,1000)• We will get all the apps and counts for event1• setRange(”app+event2=","app+event2=~",false,1000)• We will get all the apps and the counts for event2By app tastes great vs less filling• Sample code for the aggregate metrics and retrieving them https://github.com/joestein/apophis• What is with the tilde? 18
  19. 19. Sort for successNot magic, just Cassandra 19
  20. 20. A few more things about retrieving data• You need to start backwards from here.• If you want to-do things adhoc then map/reduce is better• Sometimes more rowsarebetterallowing more nodes to-do work – If you need to look at 100,000 metrics it is better to pull this out of 100 rows than out of 1 – Don’t be afraid to make CF and composite keys out of Time+ Aggregate data • 20111023+app=Yahoo! • This could be the row that holds ALL of the app information for that day, if you want to look at 100 apps at once with 1000 metrics for each per time period, this could be the way to go 20
  21. 21. Q&A/** Joe Stein*http://www.linkedin.com/in/charmalloc*@allthingshadoop*@cassandranosql*@allthingsscala*@charmalloc*http://github.com/joestein*/MedialetsThe rich mediaadplatform for mobile. connect@medialets.com www.medialets.com/showcase 21

×