Cassandra + Hadoop = Brisk

But first, a short back story…

21222324GC HELL!1718192013141516

Please volunteer if you would like to give a talk, Internet fame awaits

My experience with Cassandra in production is positive

Analytics is more difficult than it could be

Welcome Brisk! Brisk combines Hadoop, Hive and Cassandra in a “distribution”

In a nutshellCassandraFS as HDFS compatible layer; no namenode, no SPOF

Can split cluster for OLAP and OLTP workloads, scaling up either as requiredDemonstrating brisk…Building an Ad Network!

Demonstrating brisk…Building anAd Network!

The plan: Simple data model – segment users into buckets

System to put users in buckets via a pixel

AnalyticsWe Have Your KidneysThe ad-network for the paranoid generation Cookie based identification

Add user to a bucket (including ability to define expiry time)

Get buckets a user belongs toSetup Briskhttp://www.datastax.com/docs/0.8/brisk/install_brisk_ami Step-by-step guide with pictures!

Ubuntu 10.10 image with RAID 0 ephemeral disks

Jairam has been bug-fixing some minor issues

Data modelCF = users[userUUID] [segmentID] = 1CF = segments[segmentID] [userUUID] = 1

Data modelcreate keyspacewhyk... with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' ... and strategy_options = [{replication_factor:1}];create column family users ... with comparator = 'AsciiType'... and rows_cached = 5000;create column family segments... with comparator = 'AsciiType'... and rows_cached = 5000;

Our pixelhttp://wehaveyourkidneys.com/add.php? segment=<alphaNumericCode> &expire=<numberOfSeconds> We’ll use Cassandra’s expiring columns feature PHP code – uses phpcassa$pool = new ConnectionPool('whyk', array('localhost'));$users = new ColumnFamily($pool, 'users');$segments = new ColumnFamily($pool, 'segments');$users->insert($userUuid,array($segment => 1),NULL, // default TS$expires );$segments->insert($segment,array($userUuid => 1),NULL, // default TS$expires );

Real-time accesshttp://wehaveyourkidneys.com/show.php$pool = new ConnectionPool('whyk', array('localhost'));$users = new ColumnFamily($pool, 'users');// @todo this only gets first 100!$segments = $users->get($userUuid);header('Content-Type: application/json');echo json_encode(array_keys($segments));

AnalyticsHow many users in each segment?Launch HIVE (very easy!)root@brisk-01:~# brisk hive

CREATE EXTERNAL TABLE whyk.users (userUuid string, segmentId string, value string)STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler’WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key,:column,:value" );select segmentId, count(1) as totalfrom whyk.usersgroup by segmentIdorder by total desc;

Summaryhttp://www.flickr.com/photos/sovietuk/2956044892/sizes/o/in/photostream/

Cassandra + Hadoop = Brisk

More Related Content

What's hot

Similar to Cassandra + Hadoop = Brisk

More from Dave Gardner

Recently uploaded

Cassandra + Hadoop = Brisk

Editor's Notes