Bigtop Working GroupElance 6/27/2013DC Absolute SW:Intro to BWG, intro to teamRoman Cloudera: Bigtop CreatorMarshall/Ryan Palomino Labs: BenchPress
Thank You Sponsors for theDonations!● Elance, post your Hadoop Jobs here!! MeetingSpace/food● Docusign/SF for meeting space/food● Cloudera● DataPipe/$500 credit, free time for people doingPOCs, Gary?● Safari Online Books/ 30 day donation● Amazon AWS, $100/credits●
Poll● How many are managing POCs?● How many are looking to do a career changeinto Hadoop*?
Intro● Technical Architect @ABSW, 2 POCS, Hbase &Storm, Mongo doesnt count● POC example, overly simplistic example:– Write performance: Incoming data, save to disk– Read Performance:Read Time, all table scans areawful for browser interaction (reporting)Slideshare:
POCs● Proof of Concept, to verify scope, architectureand cost● A BigData Stack Implementation consists of:1) DevOps2) Application (e.g. Astyanax)3) Internals: Cloudera/MapR/HW. We dont coverinternals. Take cs346 Please!!! Github/redbase– We cover 1) and some of 2) For a POC?
Small vs. Large POCs● GM >>$1M, $5-$10M hire Cloudera. Worldexperts who cover 1), 2) & 3)● @$500k/$1M; you get 1y and most fail– A high level person ~200k/year who doesnt code– You as a newly hired tech lead or architect– 1-2+ programmers who know nothing aboutHadoop* but know the business processes● What happens after this?
Scope creep; HLP addscomponents; defines effort● Hadoop alone not fit; a VC, >1Y, fails or zombieproject, extrapolation from HLP downloads andruns wc, HLP learns from web posts and salespeopleHDFS/HadoopHBase Storm
HLP gets info from BigData vendors● Argument between Cassandra/Hadoop centers onSPOF, building an application is difficult!– Cassandra vs. HBase; nobody talks about AstyanaxLethal underspecification of 2).● See this in Job postings also. J2EE !=scalabledistributed programming● Go to Palomino Labs for 2). Have to understandZookeeper programming first! PL can do 1) and 3)● Java Concurrency->Zookeeper->Scalable Dist Apps
HLP && Machine Learning● BigData == Machine Learning. Find someone whoknows R/Mahout. The same job listing w/J2EE● R & Mahout arent used in production.● For this to work you have to be a GOOD serverprogrammer first. Not someone who downloads Tomcatand figures out how to stub out REST calls.● Separate track TBD/w Charles Nainen. Need samplePOC! W/sponsoring vendor
What to do?● Contribute to Bigtop. Why?– Teaches you the internals ofBigtop/Hbase/Hadoop/Flume and gets you 1) and APIpractice for 2)– Add new components to Bigtop● Hands on experience w/new components● Contribute to Benchpress to get to 2) as a firststep. Gets you ZK. Still long way to go● We dont cover 3). Not on the road map
Logistics● Max 20 ppl. Maple Tree Inn● Charge $100->$200/month for the room rental.● Meet 2 weeks to do demos.● Will cover Bigtop & Benchpress/ Storm futuresession● First session 3 meetings only. We reserve rightto stop these if we run out of time
Not a class● Cloudera/HW/MapR have classes, $800-1k/dayfor 3 days-1 week● They have to charge this to pay someonessalary to create material for you.● We dont replace this. I took these classes. Notgoing to steal their material. You will have toread and write test code.● Same information as a new Cloudera employee● This works if the consultants get new businessand we get open source code contributions.
Group POC?● Please talk to Roman/Bruno/Ryan/Charles if you have fundingwhich gets them new business● POC mentors– Ryan/Marshall: Application on Hadoop*. 9 people. Any POC– Ron: Chef/Bigtop group project; MongoDB in production– Roman/Bruno:Hadoop* Hadoop* POC– DC:Hadoop/Storm– Charles Nainen: ML POC. Need data/problem description/$● We can do a group POC. Talk to a POC mentor
Group Sign Up Sheets● Group POC● Form groups, at least 2+; 500k-1M POCs aregood business; have to do it as a team.● Skill shortage; not a budget issue● Working Group Session Signup for Safarisubscription and AWS codes.– June 30, July 14th, July 28th.