Brisk hadoop june2011_sfjava

Brisk: Truly peertopeer Hadoop


  srisatish.ambati AT gmail.com
  Apache Cassandra/OpenJDK
  @srisatish

Brisk: Hive + Hadoop + Cassandra

@srisatish

Map Reduce

@srisatish

Have large sets of data & you can
work on small pieces in parallel.

@srisatish

Map Reduce
@srisatish

Multicore map reduce framework,
Kunle, et al

@srisatish

Parallel Execution View @srisatish

JobTracker
NameNode
HDFS

@srisatish

Writeoncereadmany!
File once created, written & closed need change

@srisatish

Move computation, not data

@srisatish

DataNodes: Read, Write Blocks

@srisatish

NameNode:
Single Master node
Single Machine Address space
Single Point of failure

When “it” does not fit in a single node!
… Enter the distributed dragon!

Enter the Cassandra:
High Scale
Peertopeer

@srisatish

Cassandra:
High Scale
Peertopeer

@srisatish

Portfolio Demo
Low latency
Live tick prices for stocks.
Batch Analytics
Historical EOD prices.
Value at Risk.

http://www.datastax.com/docs/0.8/brisk/brisk_demo

Demo URLs (good for this demo only)

http://ec250194143.compute1.amazonaws.com:8888/opscenter/index.html
http://ec26720212176.compute1.amazonaws.com:50030/jobdetails.jsp?job
http://ec250194143.compute1.amazonaws.com:8983/portfolio/

Dynamo, 2007
Bigtable, 2006

OSS, 2008

Incubator, 2009 TLP, 2010

Y
Key “C”
A
W
Cassandra:
High Scale
U
Peertopeer F
No SPOF

T
L
P

@srisatish

“dynamic” columnfamilies

Following
zznate driftx: thobbs:

driftx

thobbs zznate:

jbellis driftx: mdennis: pcmanus: thobbs: xedin: zznate:

Brisk
HowStuffWorks version

@srisatish

YDH security edition (soon to be Apache)
Apache Hive – Access via SQL like
Cassandra 0.8
CQL Interface
Apache Thrift

Use ColumnFamilies
inode
sblock

@srisatish

String keyspace = “cfs”;
CfDef cf = new CfDef();
   cf.setName(inodeDefaultCf);
   cf.setComparator_type("BytesType");
…

     cf.setName(sblockDefaultCf);
     cf.setKey_cache_size(1M);
     cf.setComment(
"Stores blocks of information associated with a inode");

cf.setKeyspace(keyspace);

@srisatish

Consistency: R + W > N

"brisk.consistencylevel.read", "QUORUM";
"brisk.consistencylevel.write", "QUORUM";

@srisatish

Hadoop:
job tracker, task tracker

@srisatish

BriskSnitch:
brisk nodes, cassandra nodes

@srisatish

BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode)
     {
           myDC = BRISK_DC;
          logger.info("Detected Hadoop trackers
are enabled, setting my DC to " + myDC);
      }
else
      {
            myDC = CASSANDRA_DC;
logger.info("Looks like Vanilla Cassandra
nodes, setting my DC to " + myDC);
      }

@srisatish

Hive: SQLlike access
cli, hwi, jdbc, metastore
Pushdown predicates (v beta2)

@srisatish

hive>  CREATE TABLE invites (foo INT, bar
STRING)PARTITIONED BY (ds STRING);

hive>  LOAD DATA LOCAL INPATH
'$BRISK_HOME/resources/hive/examples/files
/kv2.txt' OVERWRITE INTO TABLE invites
PARTITION (ds='20080815');

hive>  SELECT count(*), ds FROM invites
GROUP BY ds;

http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish

ETL
Realtime
Cassandra CFs
DataCenters
Scale

@srisatish

No me in team!
● Ben Coverston ● Jonathan Ellis
● Ben Werther ● Michael Allen
● Brandon Williams ● Mike Bulman
● Cathy Daw ● Nate McCall
● Daria Hutchinson ● Nick M Bailey
● Eric Gilmore ● Patricio Echague
● Jackson Chung ● Tyler Hobbs
● Jake Luciani ● SriSatish Ambati
● Joaquin Casares ● Yewei Zhang

@srisatish

100node Brisk Cluster on Opscenter
@srisatish

Dynamo, 2007
Bigtable, 2006 +

OSS, 2008

Incubator 2009
TLP, 2010

Cassandra
+ +

Brisk

Git started:
git clone git@github.com:riptano/brisk.git
http://www.datastax.com/product/brisk
Getting Started via Brisk AMI.
Thank You.

@srisatish

References
● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and
Sanjay Ghemawat, http://bit.ly/googmr_pdf
● Multicore MapReduce, Kunle, et al. http://bit.ly/iRJd1n

@srisatish

Brisk hadoop june2011_sfjava

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Brisk hadoop june2011_sfjava

Similar to Brisk hadoop june2011_sfjava (20)

More from srisatish ambati

More from srisatish ambati (11)

Recently uploaded

Recently uploaded (20)

Brisk hadoop june2011_sfjava