Intravert Server side processing for Cassandra


Published on


Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Intravert Server side processing for Cassandra

  1. 1. Before we get into the heavystuff, Lets imagine hacking around with C* for a bit...
  2. 2. You run a large video website● CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) );● INSERT INTO videos (videoid, videoname, username, description, tags, upload_date) VALUES (99051fe9-6a9c- 46c2-b949-38ef78858dd0,My funny cat,ctodd, My cat likes to play the piano! So funny.,cats,piano,lol,2012-06-01 08:00:00);
  3. 3. You have a bajillion users● CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));● INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES (tcodd,Ted,Codd,,5f4dcc3b5aa765d61d8327deb882cf99 ,2011-06-01 08:00:00);
  4. 4. You can query up a storm● SELECT firstname,lastname FROM users WHERE username=tcodd; firstname | lastname -----------+---------- Ted | Codd● SELECT * FROM videos WHERE videoid = b3a76c6b-7c7f-4af6-964f- 803a9283c401 and videoname>N; videoid | videoname | description | tags | upload_date | username b3a76c6b-7c7f-4af6-964f-803a9283c401 | Now my dog plays piano! | My dog learned to play the piano because of the cat. | dogs,piano,lol | 2012- 08-30 16:50:00+0000 | ctodd
  5. 5. Thats great! Then you ask yourself...
  6. 6. ● Can I slice a slice (or sub query)?● Can I do advanced where clauses ?● Can I union two slices server side?● Can I join data from two tables without two request/response round trips?● What about procedures?● Can I write functions or aggregation functions?
  7. 7. Lets look at the APIs we have
  8. 8. But none of those APIs do what I want, and it seems simple enough to do...
  9. 9. Intravert joins the “party” at the API Layer
  10. 10. Why not just do it client side?● Move processing close to data – Idea borrowed from Hadoop● Doing work close to the source can result in: – Less network IO – Less memory spend encoding/decoding throw away data – New storage and access paradigms
  11. 11. Vertx + cassandra● What is vertx ? – Distributed Event Bus which spans the server and even penetrates into client side for effortless real- time web applications● What are the cool features? – Asynchronous – Hot re-loadable modules – Modules can be written in groovy, ruby, java, java- script
  12. 12. Transport, payload, and batching
  13. 13. HTTP Transport● HTTP is easy to use on firewalled networks● Easy to secure● Easy to compress● The defacto way to do everything anyway● IntraVert attempts to limit round-trips – Not provide a terse binary format
  14. 14. JSON Payload● Simple nested types like list, map, String● Request is composed of N operations● Each operation has a configurable timeout● Again, IntraVert attempts to limit round-trips – Not provide a terse message format
  15. 15. Why not use lighting fast transport and serialization library X?● Xs language/code gen issues● You probably can not TCP dump X● Net-admins dont like 90 jars for health checks● IntraVert attempts to limit round-trips: – Prepared statements – Server side filtering – Other cool stuff
  16. 16. Sample request and response{"e": [ { { "type": "SETKEYSPACE", "exception":null, "op": { "keyspace": "myks" } "exceptionId":null, }, { "type": "SETCOLUMNFAMILY", "opsRes": { "op": { "columnfamily": "mycf" } "0":"OK", }, { "1":"OK", "type": "SLICE", "2":[{ "op": { "name":"Founders", "rowkey": "beers", "start": "Allagash", "value":"Breakfast Stout" "end": "Sierra Nevada", }] "size": 9 }}} }]}
  17. 17. Server side filter
  18. 18. Imagine your data looks like...{ "rowkey": "beers", "name":"Allagash", "value": "Allagash Tripel" }{ "rowkey": "beers", "name":"Founders", "value": "Breakfast Stout" }{ "rowkey": "beers", "name": "DogfishHead","value": "Hellhound IPA" }
  19. 19. Application requirement● User request wishes to know which beers are “Breakfast Stout” (s)● Common “solutions”: – Write a copy of the data sorted by type – Request all the data and parse on client side
  20. 20. Using an IntraVert filter● Send a function to the server● Function is applied to subsequent get or slice operations● Only results of the filter are returned to the client
  21. 21. Defining a filter JavaScript● Syntax to create a filter { "type": "CREATEFILTER", "op": { "name": "stouts", "spec": "javascript", "value": "function(row) { if (row[value] == Breakfast Stout) return row; else return null; }" } },
  22. 22. Defining a filter Groovy/Java● We can define a groovy closure or Java filter { "type": "CREATEFILTER", "op": { "name": "stouts", "spec": "groovy", "{ row -> if (row["value"] == "Breakfast Stout") return row else return null }" } },
  23. 23. Filter flow
  24. 24. Common filter use cases● Transform data● Prune columns/rows like a where clause● Extract data from complex fields (json, xml, protobuf, etc)
  25. 25. Some light relief
  26. 26. Server Side Multi-Processor
  27. 27. Its the cure for your “redis envy”
  28. 28. Imagine your data looks like...● { “row key”:”1”, ● { “row key”:”4”, name:”a” ,val...} name:”a” ,val...}● { “row key”:”1”, ● { “row key”:”4”, name:”b” ,val...} name:”z” ,val...}
  29. 29. Application Requirements● User wishes to intersect the column names of two slices/queries● Common “solutions” – Pull all results to client and apply the intersection there
  30. 30. Server Side MultiProcessor● Send a class that implements MultiProcessor interface to server● public List<Map> multiProcess (Map<Integer,Object> input, Map params);● Do one or more get/slice operations as input● Invoke MultiProcessor on input
  31. 31. Multi-processor flow
  32. 32. Multi-processor use cases● Union N slices● Intersection N slices● Some “Join” scenarios
  33. 33. Fat client becomes the Phat client
  34. 34. Imagine you want to insert this data● User wishes to enter this event for multiple column families – 09/10/201111:12:13 – App=Yahoo – Platform=iOS – OS=4.3.4 – Device=iPad2,1 – Resolution=768x1024 – Events–videoPlayPercent=38–Taste=great
  35. 35. Inserting the dataaggregateColumnNames(”AppPlatformOSVersionDeviceResolution") = "app+platform+osversion+device+resolution#”def ccAppPlatformOSVersionDeviceResolution(c: (String) => Unit) = { c(aggregateColumnNames(”AppPlatformOSVersionDeviceResolution”) + app + p(platform) + p(osversion) + p(device) + p(resolution))}aggregateKeys(KEYSPACE ”ByMonth") = month //201109aggregateKeys(KEYSPACE "ByDay") = day //20110910aggregateKeys(KEYSPACE ”ByHour") = hour //2011091012aggregateKeys(KEYSPACE ”ByMinute") = minute //201109101213def r(columnName: String): Unit = { aggregateKeys.foreach{tuple:(ColumnFamily, String) => { val (columnFamily,row) = tuple if (row !=null && row.size > 0) rows add (columnFamily -> row has columnName inc) //increment the counter } }}ccAppPlatformOSVersionDeviceResolution(r)
  36. 36. Solution ● Send the data once and compute the N permutations on the server sidepublic void process(JsonObject request, JsonObject state, JsonObject response, EventBus eb) { JsonObject params = request.getObject("mpparams"); String uid = (String) params.getString("userid"); String fname = (String) params.getString("fname"); String lname = (String) params.getString("lname"); String city = (String) params.getString("city"); RowMutation rm = new RowMutation("myks", IntraService.byteBufferForObject(uid)); QueryPath qp = new QueryPath("users", null, IntraService.byteBufferForObject("fname")); rm.add(qp, IntraService.byteBufferForObject(fname), System.nanoTime()); QueryPath qp2 = new QueryPath("users", null, IntraService.byteBufferForObject("lname")); rm.add(qp2, IntraService.byteBufferForObject(lname), System.nanoTime()); ... try { StorageProxy.mutate(mutations, ConsistencyLevel.ONE); } catch (WriteTimeoutException | UnavailableException | OverloadedException e) { e.printStackTrace(); response.putString("status", "FAILED"); } response.putString("status", "OK");}
  37. 37. Service Processor Flow
  38. 38. IntraVert status● Still pre 1.0● Good docs –● Functional equivalent to thrift (mostly features ported)● CQL support● Virgil (coming soon)● Hbase like scanners (coming soon)
  39. 39. Hack at it
  40. 40. Questions?