Brisk hadoop june2011_sfjava

2,662
-1

Published on

Brisk: Truly peer-to-peer hadoop
Talk at SFJava

http://bit.ly/jqClhK

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,662
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
29
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Brisk hadoop june2011_sfjava

  1. 1. Brisk: Truly peer­to­peer Hadoop       srisatish.ambati AT gmail.com   Apache Cassandra/OpenJDK   @srisatish   
  2. 2. Brisk: Hive + Hadoop + Cassandra    @srisatish
  3. 3. Map Reduce    @srisatish
  4. 4. Have large sets of data & you can  work on small pieces in parallel.     @srisatish
  5. 5.     Map Reduce @srisatish
  6. 6. Multi­core map reduce framework,  Kunle, et al    @srisatish
  7. 7.     Parallel Execution View @srisatish
  8. 8.     @srisatish
  9. 9.     @srisatish
  10. 10. JobTracker NameNode HDFS    @srisatish
  11. 11. Write­once­read­many! File once created, written & closed need change    @srisatish
  12. 12. Move computation, not data    @srisatish
  13. 13.     @srisatish
  14. 14. DataNodes: Read, Write Blocks    @srisatish
  15. 15. NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure   
  16. 16. When “it” does not fit in a single node! … Enter the distributed dragon! Enter the Cassandra: High Scale Peer­to­peer    @srisatish
  17. 17. NameNode DataNodes   
  18. 18. One­kind­of­node!   
  19. 19. Cassandra: High Scale Peer­to­peer    @srisatish
  20. 20. Portfolio DemoLow latency Live tick prices for stocks.Batch Analytics Historical EOD prices. Value at Risk. http://www.datastax.com/docs/0.8/brisk/brisk_demo   
  21. 21. Demo URLs (good for this demo only)http://ec2­50­19­4­143.compute­1.amazonaws.com:8888/opscenter/index.htmlhttp://ec2­67­202­12­176.compute­1.amazonaws.com:50030/jobdetails.jsp?jobhttp://ec2­50­19­4­143.compute­1.amazonaws.com:8983/portfolio/   
  22. 22. Dynamo, 2007Bigtable, 2006 OSS, 2008 Incubator, 2009 TLP, 2010
  23. 23. Y Key “C” A W Cassandra: High Scale U Peer­to­peer F No SPOF T L P    @srisatish
  24. 24. “dynamic” columnfamilies Followingzznate driftx: thobbs:driftxthobbs zznate:jbellis driftx: mdennis: pcmanus: thobbs: xedin: zznate:
  25. 25.    
  26. 26.    
  27. 27. Brisk    @srisatish
  28. 28. Brisk HowStuffWorks version    @srisatish
  29. 29. YDH security edition (soon to be Apache)Apache Hive – Access via SQL likeCassandra 0.8CQL InterfaceApache Thrift   
  30. 30. Use ColumnFamiliesinodesblock      @srisatish
  31. 31.   String keyspace = “cfs”; CfDef cf = new CfDef();    cf.setName(inodeDefaultCf);    cf.setComparator_type("BytesType"); …             cf.setName(sblockDefaultCf);      cf.setKey_cache_size(1M);      cf.setComment(  "Stores blocks of information associated with a inode"); cf.setKeyspace(keyspace);    @srisatish
  32. 32. Consistency: R + W > N"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";    @srisatish
  33. 33. Hadoop: job tracker, task tracker    @srisatish
  34. 34. BriskSnitch: brisk nodes, cassandra nodes    @srisatish
  35. 35. BriskSimpleSnitch.javaif(TrackerInitializer.isTrackerNode)     {           myDC = BRISK_DC;          logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC);      } else      {            myDC = CASSANDRA_DC; logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC);      }     @srisatish
  36. 36. Hive: SQL­like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)    @srisatish
  37. 37. hive>  CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);hive>  LOAD DATA LOCAL INPATH $BRISK_HOME/resources/hive/examples/files/kv2.txt OVERWRITE INTO TABLE invites PARTITION (ds=2008­08­15);hive>  SELECT count(*), ds FROM invites GROUP BY ds;    http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish
  38. 38. ETL Real­time Cassandra CFs DataCenters Scale    @srisatish
  39. 39.     @srisatish
  40. 40. No me in team! ● Ben Coverston ● Jonathan Ellis ● Ben Werther ● Michael Allen ● Brandon Williams ● Mike Bulman ● Cathy Daw ● Nate McCall ● Daria Hutchinson ● Nick M Bailey ● Eric Gilmore ● Patricio Echague ● Jackson Chung ● Tyler Hobbs ● Jake Luciani ● SriSatish Ambati ● Joaquin Casares ● Yewei Zhang    @srisatish
  41. 41.     100­node Brisk Cluster on Opscenter @srisatish
  42. 42. Dynamo, 2007Bigtable, 2006 + OSS, 2008 Incubator 2009 TLP, 2010 Cassandra + + Brisk    
  43. 43. Git started:git clone git@github.com:riptano/brisk.githttp://www.datastax.com/product/briskGetting  Started via Brisk AMI.Thank You.     @srisatish
  44. 44. References ● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and  Sanjay Ghemawat, http://bit.ly/googmr_pdf ● Multi­core MapReduce, Kunle, et al. http://bit.ly/iRJd1n    @srisatish
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×