• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Intravert Server side processing for Cassandra

Intravert Server side processing for Cassandra







Total Views
Views on SlideShare
Embed Views



1 Embed 10

https://twitter.com 10



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Intravert Server side processing for Cassandra Intravert Server side processing for Cassandra Presentation Transcript

    • Before we get into the heavystuff, Lets imagine hacking around with C* for a bit...
    • You run a large video website● CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) );● INSERT INTO videos (videoid, videoname, username, description, tags, upload_date) VALUES (99051fe9-6a9c- 46c2-b949-38ef78858dd0,My funny cat,ctodd, My cat likes to play the piano! So funny.,cats,piano,lol,2012-06-01 08:00:00);
    • You have a bajillion users● CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));● INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES (tcodd,Ted,Codd, tcodd@relational.com,5f4dcc3b5aa765d61d8327deb882cf99 ,2011-06-01 08:00:00);
    • You can query up a storm● SELECT firstname,lastname FROM users WHERE username=tcodd; firstname | lastname -----------+---------- Ted | Codd● SELECT * FROM videos WHERE videoid = b3a76c6b-7c7f-4af6-964f- 803a9283c401 and videoname>N; videoid | videoname | description | tags | upload_date | username b3a76c6b-7c7f-4af6-964f-803a9283c401 | Now my dog plays piano! | My dog learned to play the piano because of the cat. | dogs,piano,lol | 2012- 08-30 16:50:00+0000 | ctodd
    • Thats great! Then you ask yourself...
    • ● Can I slice a slice (or sub query)?● Can I do advanced where clauses ?● Can I union two slices server side?● Can I join data from two tables without two request/response round trips?● What about procedures?● Can I write functions or aggregation functions?
    • Lets look at the APIs we have http://www.slideshare.net/aaronmorton/apachecon-nafeb2013
    • But none of those APIs do what I want, and it seems simple enough to do...
    • Intravert joins the “party” at the API Layer
    • Why not just do it client side?● Move processing close to data – Idea borrowed from Hadoop● Doing work close to the source can result in: – Less network IO – Less memory spend encoding/decoding throw away data – New storage and access paradigms
    • Vertx + cassandra● What is vertx ? – Distributed Event Bus which spans the server and even penetrates into client side for effortless real- time web applications● What are the cool features? – Asynchronous – Hot re-loadable modules – Modules can be written in groovy, ruby, java, java- script http://vertx.io
    • Transport, payload, and batching
    • HTTP Transport● HTTP is easy to use on firewalled networks● Easy to secure● Easy to compress● The defacto way to do everything anyway● IntraVert attempts to limit round-trips – Not provide a terse binary format
    • JSON Payload● Simple nested types like list, map, String● Request is composed of N operations● Each operation has a configurable timeout● Again, IntraVert attempts to limit round-trips – Not provide a terse message format
    • Why not use lighting fast transport and serialization library X?● Xs language/code gen issues● You probably can not TCP dump X● Net-admins dont like 90 jars for health checks● IntraVert attempts to limit round-trips: – Prepared statements – Server side filtering – Other cool stuff
    • Sample request and response{"e": [ { { "type": "SETKEYSPACE", "exception":null, "op": { "keyspace": "myks" } "exceptionId":null, }, { "type": "SETCOLUMNFAMILY", "opsRes": { "op": { "columnfamily": "mycf" } "0":"OK", }, { "1":"OK", "type": "SLICE", "2":[{ "op": { "name":"Founders", "rowkey": "beers", "start": "Allagash", "value":"Breakfast Stout" "end": "Sierra Nevada", }] "size": 9 }}} }]}
    • Server side filter
    • Imagine your data looks like...{ "rowkey": "beers", "name":"Allagash", "value": "Allagash Tripel" }{ "rowkey": "beers", "name":"Founders", "value": "Breakfast Stout" }{ "rowkey": "beers", "name": "DogfishHead","value": "Hellhound IPA" }
    • Application requirement● User request wishes to know which beers are “Breakfast Stout” (s)● Common “solutions”: – Write a copy of the data sorted by type – Request all the data and parse on client side
    • Using an IntraVert filter● Send a function to the server● Function is applied to subsequent get or slice operations● Only results of the filter are returned to the client
    • Defining a filter JavaScript● Syntax to create a filter { "type": "CREATEFILTER", "op": { "name": "stouts", "spec": "javascript", "value": "function(row) { if (row[value] == Breakfast Stout) return row; else return null; }" } },
    • Defining a filter Groovy/Java● We can define a groovy closure or Java filter { "type": "CREATEFILTER", "op": { "name": "stouts", "spec": "groovy", "{ row -> if (row["value"] == "Breakfast Stout") return row else return null }" } },
    • Filter flow
    • Common filter use cases● Transform data● Prune columns/rows like a where clause● Extract data from complex fields (json, xml, protobuf, etc)
    • Some light relief
    • Server Side Multi-Processor
    • Its the cure for your “redis envy”
    • Imagine your data looks like...● { “row key”:”1”, ● { “row key”:”4”, name:”a” ,val...} name:”a” ,val...}● { “row key”:”1”, ● { “row key”:”4”, name:”b” ,val...} name:”z” ,val...}
    • Application Requirements● User wishes to intersect the column names of two slices/queries● Common “solutions” – Pull all results to client and apply the intersection there
    • Server Side MultiProcessor● Send a class that implements MultiProcessor interface to server● public List<Map> multiProcess (Map<Integer,Object> input, Map params);● Do one or more get/slice operations as input● Invoke MultiProcessor on input
    • Multi-processor flow
    • Multi-processor use cases● Union N slices● Intersection N slices● Some “Join” scenarios
    • Fat client becomes the Phat client
    • Imagine you want to insert this data● User wishes to enter this event for multiple column families – 09/10/201111:12:13 – App=Yahoo – Platform=iOS – OS=4.3.4 – Device=iPad2,1 – Resolution=768x1024 – Events–videoPlayPercent=38–Taste=great http://www.slideshare.net/charmalloc/jsteincassandranyc2011
    • Inserting the dataaggregateColumnNames(”AppPlatformOSVersionDeviceResolution") = "app+platform+osversion+device+resolution#”def ccAppPlatformOSVersionDeviceResolution(c: (String) => Unit) = { c(aggregateColumnNames(”AppPlatformOSVersionDeviceResolution”) + app + p(platform) + p(osversion) + p(device) + p(resolution))}aggregateKeys(KEYSPACE ”ByMonth") = month //201109aggregateKeys(KEYSPACE "ByDay") = day //20110910aggregateKeys(KEYSPACE ”ByHour") = hour //2011091012aggregateKeys(KEYSPACE ”ByMinute") = minute //201109101213def r(columnName: String): Unit = { aggregateKeys.foreach{tuple:(ColumnFamily, String) => { val (columnFamily,row) = tuple if (row !=null && row.size > 0) rows add (columnFamily -> row has columnName inc) //increment the counter } }}ccAppPlatformOSVersionDeviceResolution(r) http://www.slideshare.net/charmalloc/jsteincassandranyc2011
    • Solution ● Send the data once and compute the N permutations on the server sidepublic void process(JsonObject request, JsonObject state, JsonObject response, EventBus eb) { JsonObject params = request.getObject("mpparams"); String uid = (String) params.getString("userid"); String fname = (String) params.getString("fname"); String lname = (String) params.getString("lname"); String city = (String) params.getString("city"); RowMutation rm = new RowMutation("myks", IntraService.byteBufferForObject(uid)); QueryPath qp = new QueryPath("users", null, IntraService.byteBufferForObject("fname")); rm.add(qp, IntraService.byteBufferForObject(fname), System.nanoTime()); QueryPath qp2 = new QueryPath("users", null, IntraService.byteBufferForObject("lname")); rm.add(qp2, IntraService.byteBufferForObject(lname), System.nanoTime()); ... try { StorageProxy.mutate(mutations, ConsistencyLevel.ONE); } catch (WriteTimeoutException | UnavailableException | OverloadedException e) { e.printStackTrace(); response.putString("status", "FAILED"); } response.putString("status", "OK");}
    • Service Processor Flow
    • IntraVert status● Still pre 1.0● Good docs – https://github.com/zznate/intravert-ug/wiki/_pages● Functional equivalent to thrift (mostly features ported)● CQL support● Virgil (coming soon)● Hbase like scanners (coming soon)
    • Hack at ithttps://github.com/zznate/intravert-ug
    • Questions?