Hadoop meet Rex(How to construct hadoop cluster with rex)
Upcoming SlideShare
Loading in...5

Hadoop meet Rex(How to construct hadoop cluster with rex)



I'll attend open source seminar("open source framework for practical hadoop").

I'll attend open source seminar("open source framework for practical hadoop").
This is my presentation file.



Total Views
Views on SlideShare
Embed Views



3 Embeds 20

https://twitter.com 13 6
https://www.linkedin.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • 2013.10.10(목) 세미나 발표자료 입니다.

    1. 하둡 시스템 설치 및 운영 소개
    2. 빅데이터 분석 알고리즘 소개 및 사례
    3. 빅데이터 분석 오픈소스를 활용한 데이터 분석 사례
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Hadoop meet Rex(How to construct hadoop cluster with rex) Hadoop meet Rex(How to construct hadoop cluster with rex) Presentation Transcript

  • Hadoop meet (R)?ex - How to use Rexify for Hadoop cluster construct Original Rex base image http://rexify.org 2013-08-26 Original Hadoop image http://hadoop.apahce.org
  • Background
  • Mission • I’m not S/W developer any more • I’m not system engineer • But, I had to construct hadoop cluster – Moreover, in various types... http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate- Why-Mock-Me-Homer-Magnet-SM130.html
  • Hadoop is • The hadoop cluster is consist of many linux boxes • The hadoop has many configuration files and parameters • Besides hadoop, variety S/W of the hadoop eco system should be installed. • Except Hadoop & Hadoop eco, many types S/W should be installed & configured – Tomcat, apache, DBMS, other develop tools, other utils/libs… • And so on …
  • At first time, • I have did it manually – Install & Configure.. – Install & Configure – Install & Configure – Install & Configure – …. Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html
  • Tiresome !! • It is really tedious & horrible job !! Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
  • Find to other way • I decide to find other way!! • I’ve started to survey for other solutions Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html
  • Survey
  • Variety solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.cbsnews.com/8301-505125_162-31042083/duke- research-monkeys-like-humans-want-variety/
  • Hadoop Managers Hortonworks Management Center™ Clouder’s CDH™ * Apache Ambari
  • Provisioning Tools Fabric(Python)
  • Parallel SSH Tools http://dev.naver.com/projects/dist/ https://code.google.com/p/parallel-ssh/ http://sourceforge.net/projects/clusterssh/
  • Examination(1/3) • Hadoop Managers ↑ Specialized in the hadoop ↑ Aleardy confirmed ↑ Comportable ↓ Commercial or restrict license ↓ No support other App/libs, excluding Java/Hadoop/Hadoop Eco
  • Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.bizbuilder.com/how-much-does-an-inexpensive- franchise-cost/  I have no money  I want to use more extra resource ※Recently, there are many changes in license policy. Please check it!!
  • Examination(2/3) • Other provisioning tools ↑ Powerful ↑ Many features ↑ Detailed control ↑ ↓ Complicatedness ↓ Need a lot of study
  • Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools source :www.mbc.co.kr  I don’t like to study
  • Examination(3/3) • Other pararell ssh tools ↑ Simple ↑ Useful ↑ No need to install extra agent ↓ There are some insufficient features ↓ All exceptional cases are should be considered
  • Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm  Yes, I’m a greedy
  • ● Simple & ● Powerful & ● No cost & ● Expandable & ● Smart way??? http://plug.hani.co.kr/heihei9999/459415 So, What is?
  • I have found solution
  • http://rexify.org/ It is Rex!!
  • ● uses just ssh ● no agent required ● seamless intergration ● no conflicts ● easy to use ● easy to extend ● easy to learn ● can use advanced perl’s power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over- and-let-holiday.html Rex is
  • Rex options [onycom@onydev: ~]$rex -h (R)?ex - (Remote)? Execution -b Run batch -e Run the given code fragment -E Execute task on the given environment -H Execute task on these hosts -G Execute task on these group -u Username for the ssh connection -p Password for the ssh connection -P Private Keyfile for the ssh connection -K Public Keyfile for the ssh connection -T List all known tasks. -Tv List all known tasks with all information. -f Use this file instead of Rexfile -h Display this help -M Load Module instead of Rexfile -v Display (R)?ex Version -F Force. Don't regard lock file -s Use sudo for every command -S Password for sudo -d Debug -dd More Debug (includes Profiling Output) -o Output Format -c Turn cache ON -C Turn cache OFF -q Quiet mode. No Logging output -Q Really quiet. Output nothing. -t Number of threads to use
  • Basic Gramma - Authentication From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • Basic Gramma - Server Group From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • Basic Gramma - Task From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • Lets get down to the main subject!
  • Construct hadoop with (R)?ex
  • This presentaion is ● How to easy install & configure Hadoop – Not “How to optimize & performance tunning” ● To easy understanding, – exceptional cases are excluded ● No explain to OS installation – no discuss about “PXE /kicstart” ● Reduced environment conditions – ex) security, network, other servers/Apps, … ● I’ll not talk about perl language as possible – It is no needed ● TMTOWTDI – Even if it’s not refined, I’ll show variety way as possible
  • Network vmaster (Name node/ Job Tracker) L2 switch Onydev (Provision Server) vnode0 (Data node) vnode1 (Data node) vnode2 (Data node) vmonitor (Monitoring Server) Topology [spec]  Machine : 6 ea (hadoop has just 4 ea)  OS : CentOS 6.4 64bit  Memory : 32GB(NN) 16GB(DN)  CPU : 4 core(i7, 3.5GHz)  Interface : 1G Ethernet  Disk : 250G SDD 1T HDD ※ I’ve configured NN and JT on the same machine
  • Our hadoop Env. is ● There is one control account – ‘hadoop-user’ ● hadoop & hadoop eco is installed in ‘hadoop-user’ account
  • Prepare – All machines ● On the each machine, – same OS version would be installed (at least, hadoop cluster ) – has own fixed IP address – can be connect with SSH – has one more normal user account & it’s sudoe rs edit work (just optional)
  • Prepare – Provision Server(1/2) ● Develop tools & envrionment – ex: gcc, glib, make/cmake, perl, etc... ● Install Perl modules – yum install perl-ExtUtil* – yum install perl-CPAN* – excute ‘cpan’ command
  • Prepare – Provision Server(2/2) ● After execute ‘cpan’ command – cpan 3> install Rex – You may get fail!! – This all story is based on the CentOS 6.XX ● So, I recommend ‘perl brew’ – If you want to use more perl power ※In my guess, redhat may dislike perl language
  • To Install Rex (1/3) adduser brew-user passwd brew-user curl -L http://install.perlbrew.pl | bash cd /home chmod 755 brew-user cd ~brew-user chmod -R 755 ./perl5 echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc ##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc source ~brew-user/.bashrc
  • To Install Rex (2/3) ## In the brew-user account, perlbrew init perlbrew available ### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11) perlbrew install perl-5.18.0 perlbrew switch perl-5.18.0 [brew-user@onydev: ~]$perlbrew switch perl-5.18.0 Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34. ......... A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.
  • ● cpanm Rex ● cpan ● http://rexify.org/get/ To Install Rex (3/3)
  • Test for Rex [onycom@onydev: ~]$which rex /home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex [onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'" [2013-10-08 15:36:06] INFO - Running task eval-line on localhost [2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom) [2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate. [2013-10-08 15:36:07] INFO - Successfully authenticated on localhost. onydev [onycom@onydev: ~]$ ● Rexfile ● plain text file
  • /etc/hosts - Provision Server localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ... skip ................. onydev ... skip ................. vmaster vnode0 vnode1 vnode2 vmonitor ~
  • SSH connection ● Between  Provision server and other target servers  Hadoop master node and data nodes
  • [onycom@onydev: ~]$ ssh-keygen –t rsa Enter file in which to save the key (/home/onycom/.ssh/id_rsa): Created directory '/home/onycom/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/onycom/.ssh/id_rsa. Your public key has been saved in /home/tasha/.ssh/id_rsa.pub. Prepare SSH public key
  • Create User use Rex::Commands::User; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; my $USER = “hadoop-user”; desc "Create user"; task "new_user", group => “all_vm_node”, sub { create_user “$USER", home => "/home/$USER", comment=>"Account for _hadoop", password => "blabla", }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user
  • Setup SSH for user desc "setup ssh for user"; task "setup_ssh_user", group => “all_vm_node”, sub { run "mkdir /home/$USER/.ssh"; file "/home/$USER/.ssh/authorized_keys", source => "/home/onycom/.ssh/id_rsa.pub", owner => "$USER", group => "$USER", mode => 644; run "chmod 700 /home/$USER/.ssh"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user ※ Ok!! Done. Now you can login to each servers without password Then, do same thing for hadoop NN/DN nodes.
  • Install packages parallelism 4; desc "Install packages for java"; task "install_java", group => “all_vm_node”, sub { install package => “java-1.6.*"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java • Some packages are should be installed globaly(ex: java, wget, etc) • For the hadoop 1.1.x, java 1.6 is recommanded. • use parallelism keyword (if long time is required)
  • Install hadoop(1/3) user "hadoop-user"; private_key "/home/onycom/.ssh/id_rsa"; public_key "/home/onycom/.ssh/id_rsa.pub"; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; desc "prepare_dir"; task "prepare_dir", group=>"hadoop_node", sub { run "mkdir Work"; run "mkdir Download"; run "mkdir src“; run “mkdir tmp”; }; hd1.Rexfile onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir
  • Install hadoop(2/3) desc "hadoop 1.1.2 download with wget"; task "get_hadoop", group=>"hadoop_node", sub { my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop- 1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ...skip.... desc "pig 0.11.1 download with wget"; task "get_pig", group=>"hadoop_node", sub { my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ! hadoop ver. & hadoop eco s/w ver. should be matched This topic is get off the subject on this presentation
  • Install hadoop(3/3) my $HADOOP_SRC_DIR = "/home/hadoop-user/src"; desc "unzip hadoop source files"; task "unzip_src",group=>"hadoop_node", sub { run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR"; }; desc "make link for hadoop source files"; task "link_src", group=>"hadoop_node", sub { run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR; run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR; run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR; };
  • Configuration files(1/3) ● System – /etc/hosts ● Hadoop(../hadoop/conf) – masters & slave – hadoop-env.sh – hdfs-site.xml – core-site.xml – mapred-site.xml
  • Configuration files(2/3) ● Hadoop eco systems & other tools – ex) Ganglia – ex) Flume – agent/collector/master – ex) Oozie or flamingo – Skip these on this PPT. ● User rc file  These are just default & no consider optimization
  • Configuration files(3/3) Provision Server Hadoop NN Hadoop DN 1 Hadoop DN n Hadoop configuration files (../hadoop_conf_repo) SSH/SCP (R)ex ※ Of course, this is just my policy
  • Edit hosts file my $target_file = “/etc/hosts”; my $host_list =‘<<END’ vmaster vnode0 vnode1 vnode2 vmonitor END desc "Add hosts"; task "add_host", group => “all_vm_node", sub { my $exist_cnt = cat $target_file; my $fh = file_write $target_file; $fh->write( $exist_cnt ); $fh->write($host_list); $fh->close; }; ※ You can consider ‘Augeas tool’ to handle system files. Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’
  • Setup .bashrc for user(1/2) ... skip ..... my $hadoop_rc=<<'END'; #Hadoop Configuration export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64" export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext" export HADOOP_USER="/home/hadoop-user" export HADOOP_SRC="$HADOOP_USER/src" export HADOOP_HOME="$HADOOP_USER/hadoop" export PIG_HOME="$HADOOP_SRC/pig" export HIVE_HOME="$HADOOP_SRC/hive" END ... skip .....
  • Setup .bashrc for user(2/2) desc "setup hadoop-user's .rc file"; task "setup_rc_def", group=>"hadoop_node", sub { my $fh = file_append ".bashrc"; $fh->write($base_rc); $fh->write($hadoop_rc); $fh->close(); }; desc "setup hadoop master node .rc file"; task "setup_rc_master", "vmaster", sub { my $fh = file_append ".bashrc"; $fh->write($master_rc); $fh->close(); }; .......... skip ............
  • Configure Hadoop(1/6) ● ‘masters’ [hadoop-user@vmaster: ~]$cd hadoop/conf [hadoop-user@vmaster: conf]$cat masters vmaster ● ‘slaves’ [hadoop-user@vmaster: conf]$cat slaves vnode0 vnode1 vnode2
  • Configure Hadoop(2/6) • hadoop-env.sh ... skip ... The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 #hadoop-user #Remove warring message for "HADOOP_HOME" is deprecated export HADOOP_HOME_WARN_SUPPRESS=TRUE
  • Configure Hadoop(3/6) • hdfs-site.xml ... skip ... <configuration> <!-- modified by hadoop-user --> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop-user/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop-user/hdfs/data</value> </property> </configuration> ※ This ‘replication’ value is depend on our env.
  • Configure Hadoop(4/6) • core-site.xml ... skip ... <configuration> <!--modified by hadoop-user --> <property> <name>fs.default.name</name> <value>hdfs://vmaster:9000</value> </property> </configuration>
  • Configure Hadoop(5/6) • mapred-site.xml .. skip .. <property> <name>mapred.job.tracker</name> <value>vmaster:9001</value> </property> <!-- 2013.9.11. Increse the setting timeout for fail to report status error --> <property> <name>mapred.task.timeout</name> <value>1800000</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property> ※ This ‘timeout’ value is just depend on our env.
  • Configure Hadoop(6/6) my $CNF_REPO="hadoop_conf_repo"; ... skip ... my $MAPRED="mapred-site.xml"; task "upload_mapred", group=>"hadoop_node", sub { file "$HD_CNF/$MAPRED", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$MAPRED"; }; my $CORE_SITE="core-site.xml"; task "upload_core", group=>"hadoop_node", sub { file "$HD_CNF/$CORE_SITE", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$CORE_SITE"; }; ... skip ....
  • Before going any further ● Stop selinux – If it is enforcing ● modify policy of iptables – I recommend to stop it while configure working
  • Lets start hadoop ● login to master node with hadoop-user – ssh –X hadoop-user@vmaster ● hadoop namenode format – hadoop namenode format ● execute start script – ex) start-all.sh
  • Check hadoop status [hadoop-user@vmaster: ~]$jps -l 22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22260 org.apache.hadoop.mapred.JobTracker 21968 org.apache.hadoop.hdfs.server.namenode.NameNode 27896 sun.tools.jps.Jps [hadoop-user@vmaster: ~]$hadoop fs -ls / Found 1 items drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp ※ It seems to be OK. Really?
  • But, life is not easy http://www.trulygraphics.com/tg/weekend/
  • Check status for all DNs task "show_jps", "vnode[0..2]", sub { say run "hostname"; my $r = run "jps"; say $r; }; [onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps vnode0 12682 Jps 12042 TaskTracker 11934 DataNode vnode1 11669 DataNode 11778 TaskTracker 12438 Jps vnode2 11128 DataNode 11237 TaskTracker 11895 Jps
  • If there is some problem, http://blog.lib.umn.edu/isss/undergraduate/2011/11/y ou-do-have-any-tech-problem.html ● Check again – /etc/hosts – selinux & iptables – name & data dir./permission in hdfs – and so on... (on the each node)
  • If you did not meet any problems or fixed those,
  • Now you have hadoop https://hadoopworld2011.eventbrite.com/ Automatic MGM/Prov. solution yonhap &
  • Advnaced Challenge
  • What more can we do?(1/2) ● add/remove data node ● add/remove storage ● Intergrate with monitoring – ex: Ganglia/Nagios ● Intergrate with other hadoop eco – Flume, flamingo, Oozie ● Intergrate other device or server – ex: Switch, DB server
  • What more can we do?(2/2) ● sophisticated hadoop paramer control – ex: use XML parsing ● workflow control & batch ● backup ● periodic file system management – ex: log files ● web GUI ● make a framework for your purpose
  • Ref. • http://hadoop.apache.org/ • http://pig.apache.org/ • http://hive.apache.org/ • http://confluence.openflamingo.org • http://www.openankus.org • http://www.rexify.org • https://groups.google.com/forum/#!forum/re x-users • http://modules.rexify.org/search?q=hadoop
  • http://www.projects2crowdfund.com/what-can-i-do-with- crowdfunding/
  • Thanksjunkim@onycom.com / rainmk6@gmail.com