Brisk hadoop june2011

  • 2,523 views
Uploaded on

Brisk - Truly peer-to-peer hadoop. …

Brisk - Truly peer-to-peer hadoop.

Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.

With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.

We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!

LA Scalability Talk, Mahalo
May 31.2011

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,523
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
78
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Brisk: Truly peer­to­peer Hadoop       srisatish.ambati AT gmail.com   DataStax/OpenJDK   @srisatish   
  • 2. Brisk: Hive + Hadoop + Cassandra    @srisatish
  • 3. Map Reduce    @srisatish
  • 4. Have large sets of data & you can  work on small pieces in parallel.     @srisatish
  • 5.     Map Reduce @srisatish
  • 6. Multi­core map reduce framework,  Kunle, et al    @srisatish
  • 7.     Parallel Execution View @srisatish
  • 8.     @srisatish
  • 9.     @srisatish
  • 10. JobTracker NameNode HDFS    @srisatish
  • 11. Write­once­read­many! File once created, written & closed need change    @srisatish
  • 12. Move computation, not data    @srisatish
  • 13.     @srisatish
  • 14. DataNodes: Read, Write Blocks    @srisatish
  • 15. NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure   
  • 16. When “it” does not fit in a single node! … Enter the distributed dragon! Enter the Cassandra: High Scale Peer­to­peer    @srisatish
  • 17. NameNode DataNodes   
  • 18. One­kind­of­node!   
  • 19. Cassandra: High Scale Peer­to­peer    @srisatish
  • 20. Portfolio DemoLow latency Live tick prices for stocks.Batch Analytics Historical EOD prices. Value at Risk. http://www.datastax.com/docs/0.8/brisk/brisk_demo   
  • 21. Demo URLs (good for this demo only)http://ec2­50­19­4­143.compute­1.amazonaws.com:8888/opscenter/index.htmlhttp://ec2­67­202­12­176.compute­1.amazonaws.com:50030/jobdetails.jsp?jobhttp://ec2­50­19­4­143.compute­1.amazonaws.com:8983/portfolio/   
  • 22. Dynamo, 2007Bigtable, 2006 OSS, 2008 Incubator, 2009 TLP, 2010
  • 23. Y Key “C” A W Cassandra: High Scale U Peer­to­peer F No SPOF T L P    @srisatish
  • 24.    
  • 25.    
  • 26. Brisk    @srisatish
  • 27. Brisk HowStuffWorks version    @srisatish
  • 28. YDH security edition (soon to be Apache)Apache Hive – Access via SQL like CassandraHandlerCassandra 0.8   
  • 29. Use ColumnFamiliesinodesblock      @srisatish
  • 30.   String keyspace = “cfs”; CfDef cf = new CfDef();    cf.setName(inodeDefaultCf);    cf.setComparator_type("BytesType"); …             cf.setName(sblockDefaultCf);      cf.setKey_cache_size(1M);      cf.setComment(  "Stores blocks of information associated with a inode"); cf.setKeyspace(keyspace);    @srisatish
  • 31. Consistency: R + W > N"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";    @srisatish
  • 32. Hadoop: job tracker, task tracker    @srisatish
  • 33. BriskSnitch: brisk nodes, cassandra nodes    @srisatish
  • 34. BriskSimpleSnitch.javaif(TrackerInitializer.isTrackerNode)     {           myDC = BRISK_DC;          logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC);      } else      {            myDC = CASSANDRA_DC; logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC);      }     @srisatish
  • 35. Hive: SQL­like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)    @srisatish
  • 36. hive>  CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);hive>  LOAD DATA LOCAL INPATH $BRISK_HOME/resources/hive/examples/files/kv2.txt OVERWRITE INTO TABLE invites PARTITION (ds=2008­08­15);hive>  SELECT count(*), ds FROM invites GROUP BY ds;    http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish
  • 37. ETL Real­time Cassandra CFs DataCenters Scale    @srisatish
  • 38.     @srisatish
  • 39. No me in team! ● Ben Coverston ● Michael Allen ● Ben Werther ● Mike Bulman ● Brandon Williams ● Michael Weir ● Cathy Daw ● Nate McCall ● Daria Hutchinson ● Nick M Bailey ● Jackson Chung ● Patricio Echague ● Jake Luciani ● Tyler Hobbs ● Joaquin Casares ● SriSatish Ambati ● Jonathan Ellis ● Yewei Zhang    @srisatish
  • 40.     100­node Brisk Cluster on Opscenter @srisatish
  • 41. Dynamo, 2007Bigtable, 2006 + OSS, 2008 Incubator 2009 TLP, 2010 Cassandra + + Brisk    
  • 42. git clone git@github.com:riptano/brisk.githttp://www.datastax.com/product/briskGetting  Started via Brisk AMI.Mahalo. Thank You.     @srisatish
  • 43. References ● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and  Sanjay Ghemawat, http://bit.ly/googmr_pdf ● Multi­core MapReduce, Kunle, et al. http://bit.ly/iRJd1n    @srisatish