Brisk hadoop june2011_sfjava
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Brisk hadoop june2011_sfjava

on

  • 2,287 views

Brisk: Truly peer-to-peer hadoop

Brisk: Truly peer-to-peer hadoop
Talk at SFJava

http://bit.ly/jqClhK

Statistics

Views

Total Views
2,287
Views on SlideShare
1,980
Embed Views
307

Actions

Likes
3
Downloads
28
Comments
0

5 Embeds 307

http://m7a.me 301
http://www.slideshare.net 3
url_unknown 1
http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Brisk hadoop june2011_sfjava Presentation Transcript

  • 1. Brisk: Truly peer­to­peer Hadoop       srisatish.ambati AT gmail.com   Apache Cassandra/OpenJDK   @srisatish   
  • 2. Brisk: Hive + Hadoop + Cassandra    @srisatish
  • 3. Map Reduce    @srisatish
  • 4. Have large sets of data & you can  work on small pieces in parallel.     @srisatish
  • 5.     Map Reduce @srisatish
  • 6. Multi­core map reduce framework,  Kunle, et al    @srisatish
  • 7.     Parallel Execution View @srisatish
  • 8.     @srisatish
  • 9.     @srisatish
  • 10. JobTracker NameNode HDFS    @srisatish
  • 11. Write­once­read­many! File once created, written & closed need change    @srisatish
  • 12. Move computation, not data    @srisatish
  • 13.     @srisatish
  • 14. DataNodes: Read, Write Blocks    @srisatish
  • 15. NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure   
  • 16. When “it” does not fit in a single node! … Enter the distributed dragon! Enter the Cassandra: High Scale Peer­to­peer    @srisatish
  • 17. NameNode DataNodes   
  • 18. One­kind­of­node!   
  • 19. Cassandra: High Scale Peer­to­peer    @srisatish
  • 20. Portfolio DemoLow latency Live tick prices for stocks.Batch Analytics Historical EOD prices. Value at Risk. http://www.datastax.com/docs/0.8/brisk/brisk_demo   
  • 21. Demo URLs (good for this demo only)http://ec2­50­19­4­143.compute­1.amazonaws.com:8888/opscenter/index.htmlhttp://ec2­67­202­12­176.compute­1.amazonaws.com:50030/jobdetails.jsp?jobhttp://ec2­50­19­4­143.compute­1.amazonaws.com:8983/portfolio/   
  • 22. Dynamo, 2007Bigtable, 2006 OSS, 2008 Incubator, 2009 TLP, 2010
  • 23. Y Key “C” A W Cassandra: High Scale U Peer­to­peer F No SPOF T L P    @srisatish
  • 24. “dynamic” columnfamilies Followingzznate driftx: thobbs:driftxthobbs zznate:jbellis driftx: mdennis: pcmanus: thobbs: xedin: zznate:
  • 25.    
  • 26.    
  • 27. Brisk    @srisatish
  • 28. Brisk HowStuffWorks version    @srisatish
  • 29. YDH security edition (soon to be Apache)Apache Hive – Access via SQL likeCassandra 0.8CQL InterfaceApache Thrift   
  • 30. Use ColumnFamiliesinodesblock      @srisatish
  • 31.   String keyspace = “cfs”; CfDef cf = new CfDef();    cf.setName(inodeDefaultCf);    cf.setComparator_type("BytesType"); …             cf.setName(sblockDefaultCf);      cf.setKey_cache_size(1M);      cf.setComment(  "Stores blocks of information associated with a inode"); cf.setKeyspace(keyspace);    @srisatish
  • 32. Consistency: R + W > N"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";    @srisatish
  • 33. Hadoop: job tracker, task tracker    @srisatish
  • 34. BriskSnitch: brisk nodes, cassandra nodes    @srisatish
  • 35. BriskSimpleSnitch.javaif(TrackerInitializer.isTrackerNode)     {           myDC = BRISK_DC;          logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC);      } else      {            myDC = CASSANDRA_DC; logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC);      }     @srisatish
  • 36. Hive: SQL­like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)    @srisatish
  • 37. hive>  CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);hive>  LOAD DATA LOCAL INPATH $BRISK_HOME/resources/hive/examples/files/kv2.txt OVERWRITE INTO TABLE invites PARTITION (ds=2008­08­15);hive>  SELECT count(*), ds FROM invites GROUP BY ds;    http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish
  • 38. ETL Real­time Cassandra CFs DataCenters Scale    @srisatish
  • 39.     @srisatish
  • 40. No me in team! ● Ben Coverston ● Jonathan Ellis ● Ben Werther ● Michael Allen ● Brandon Williams ● Mike Bulman ● Cathy Daw ● Nate McCall ● Daria Hutchinson ● Nick M Bailey ● Eric Gilmore ● Patricio Echague ● Jackson Chung ● Tyler Hobbs ● Jake Luciani ● SriSatish Ambati ● Joaquin Casares ● Yewei Zhang    @srisatish
  • 41.     100­node Brisk Cluster on Opscenter @srisatish
  • 42. Dynamo, 2007Bigtable, 2006 + OSS, 2008 Incubator 2009 TLP, 2010 Cassandra + + Brisk    
  • 43. Git started:git clone git@github.com:riptano/brisk.githttp://www.datastax.com/product/briskGetting  Started via Brisk AMI.Thank You.     @srisatish
  • 44. References ● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and  Sanjay Ghemawat, http://bit.ly/googmr_pdf ● Multi­core MapReduce, Kunle, et al. http://bit.ly/iRJd1n    @srisatish