Your SlideShare is downloading. ×
0
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Apache bigtopwg7142013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache bigtopwg7142013

113

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
113
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Bigtop Working Group 7/14/2013 Basic Skills Hadoop Pipelines (Roman's/Ron's Idea) Career positioning
  • 2. Basic Skills ● Working group, you set your own goals. Structure: do a demo in front of the class. Focus on skills employers are looking for. ● Cluster skills using AWS; create instances, ec2-api, will have to extend this using scripts or your own code. Have to demo some skill – Goal:Manage multiple instances. You can do this manually but the number of keystrokes goes up exponentially as you add new components. Need some automation or code. – Bash scripts are good b/c they are used in Bigtop init.d files and Roman's code, e.g. copy the mkdir commands into script and run them.
  • 3. Basic Skills ● Hadoop*, all the features of 2.0.0. No training course can give this to you. You will have to manually do this. – Use 2.0.X unit test code as a base
  • 4. Hadoop 2.0.0 ● Basic FS Review: – Copy On Write – Write Through/Write Back, FSCK – Inodes/BTrees, NN/DN
  • 5. Working Group ● Not a class which gives you answers. The answers classes give you are too simple to be valuable. ● E.g.; Does YARN/Hadoop 2.0.X support multitenancy? Multiple users/companies cant see each other's data and if they run a query, they can't crash the cluster for other users. This isn't the case now.
  • 6. Hadoop 2.0.X ● Zookeeper in HDFS, requires some administration. Do you need to do a rollback of zookeeper logs when a zk cluster fails?
  • 7. Bigtop Basic Skills ● Run Bigtop in AWS in distributed mode, start w/HDFS ● Create Hadoop* pipelines (Roman's/Ron's idea) – Ron: book. Great idea!!!!! ● Run mvn verify/learn to debug and write tests hers ● Will take months, demo driven. People do demos.
  • 8. Career positioning ● Choose where to spend time. ● Bigdata = – Devops – App development (Astyanax) – Internals ● Don't get distracted into 3). Not enough time to do all well. Let Cloudera ppl help you. ● Do something new that people care about – Don't try to be better than people w/the same job skill – Learn efficiently, practice, practice, practice, Can't learn by watching
  • 9. Big Company vs. Small ● Big: – Interpolate Cloudera's strategy. Hadoop 2.0.X runs in the cloud, users access from Desktop via browser, can run Hive/Pig on YOUR data, if you need to ingest data like w/flume a sys admin has to set this up. e.g. Don't spend time getting flume to work in Hue. But make sure you know 2.0.x security models/LDAP, pipeline debugging when things get stuck, failover, application development – HUE != Ambari. Why? – Value to building apps in HUE or w/HUE. Approach for webapps changing away from HUE to something like Ambari which is a simpler user defined MVC pattern. – User defined MVC better. Why? Think like a manager and what happens as Django adds more complicated features? – e.g. Jetty/J2EE example
  • 10. Small ● Do everything, use BT, get to working app as fast as possible. 1) and 2) very important. Have to do things quickly. ● You decide how to spend your own time
  • 11. Structure ● Schedule 3x meetings after this every 2 weeks ● Individual demos ● Install Bigtop, demo WC, PI, demo components and pipelines. ● Turn pipeline demos into integration tests ● Test on pseudo distributed mode and cluster ● Listen to Roman: Hue....
  • 12. HBase/Hadoop ● HBase requirements: R/S 48GB, 8-12 cores/node Memory: M/R 1-2GB+, R/S 32GB+, OS 4-8GB, HDFS ● Disk: 25% for shuffle files for HDFS, <50% full, JBOD, no RAID
  • 13. Starting Hadoop, M/R ● Look at the logs /var/log/hadoop-hdfs ● Cluster ID: under ~/cache/.../data, VERSION, change the text. DEMO ● No connection, check ping, check core-site.xml, /etc/hosts ● M/R/Yarn: mapred-site.xml. NOTE: M/R uses port 8021 and so does NAMENODE. Keep this port, run on differeent server; open port 8031 ● Telnet jt:8021, turn off iptables, disable selinux
  • 14. M/R Setup ● 1 node manager – WRONG_REDUCE=0 – File Input Format Counters – Bytes Read=1180 – File Output Format Counters – Bytes Written=97 – Job Finished in 92.72 seconds – Estimated value of Pi is 3.14080000000000000000 –
  • 15. M/R AWS ● 3 nodemanagers ● File Output Format Counters ● Bytes Written=97 ● Job Finished in 86.762 seconds ● Estimated value of Pi is 3.14080000000000000000 ●
  • 16. Zookeeper Administration
  • 17. Many options for projects ● Integration code testing when Roman gets here in 2 weeks ● Work w/Ron or Victor on projects ● Update the wiki w/ AWS cluster setup, automate w/whirr? + chef/puppet? ● Add HBase, Zookeeper management for Hadoop(monit/supervisord)

×