Hadoop meet (R)?ex
- How to use Rexify for Hadoop cluster construct
Original Rex base image http://rexify.org
2013-08-26
O...
Background
Mission
• I’m not S/W developer any more
• I’m not system engineer
• But, I had to construct hadoop
cluster
– Moreover, in...
Hadoop is
• The hadoop cluster is consist of many linux
boxes
• The hadoop has many configuration files and
parameters
• B...
At first time,
• I have did it manually
– Install & Configure..
– Install & Configure
– Install & Configure
– Install & Co...
Tiresome !!
• It is really tedious & horrible job !!
Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
Find to other way
• I decide to find other way!!
• I’ve started to survey for other solutions
Img http://www.101-charger.c...
Survey
Variety solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://www.cbsnews.com/8301-505125_162-31042...
Hadoop Managers
Hortonworks Management Center™
Clouder’s CDH™
* Apache Ambari
Provisioning Tools
Fabric(Python)
Parallel SSH Tools
http://dev.naver.com/projects/dist/
https://code.google.com/p/parallel-ssh/
http://sourceforge.net/proj...
Examination(1/3)
• Hadoop Managers
↑ Specialized in the
hadoop
↑ Aleardy confirmed
↑ Comportable
↓ Commercial or
restrict ...
Other solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://www.bizbuilder.com/how-much-does-an-ine...
Examination(2/3)
• Other provisioning tools
↑ Powerful
↑ Many features
↑ Detailed control
↑
↓ Complicatedness
↓ Need a lot...
Other solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
source :www.mbc.co.kr
 I don’t like to study
Examination(3/3)
• Other pararell ssh tools
↑ Simple
↑ Useful
↑ No need to install
extra agent
↓ There are some
insufficie...
Other solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://bluebuddies.com/Smurfs_Panini_Smurf_Sti...
● Simple &
● Powerful &
● No cost &
● Expandable &
● Smart way???
http://plug.hani.co.kr/heihei9999/459415
So, What is?
I have found solution
http://rexify.org/
It is Rex!!
● uses just ssh
● no agent required
● seamless intergration
● no conflicts
● easy to use
● easy to extend
● easy to learn
...
Rex options
[onycom@onydev: ~]$rex -h
(R)?ex - (Remote)? Execution
-b Run batch
-e Run the given code fragment
-E Execute ...
Basic Gramma - Authentication
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
Basic Gramma - Server Group
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
Basic Gramma - Task
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
Lets get down to the main
subject!
Construct hadoop with
(R)?ex
This presentaion is
● How to easy install & configure Hadoop
– Not “How to optimize & performance tunning”
● To easy under...
Network
vmaster
(Name node/
Job Tracker)
L2 switch
Onydev
(Provision Server)
vnode0
(Data node)
vnode1
(Data node)
vnode2
...
Our hadoop Env. is
● There is one control account
– ‘hadoop-user’
● hadoop & hadoop eco is installed in
‘hadoop-user’ acco...
Prepare – All machines
● On the each machine,
– same OS version would be installed
(at least, hadoop cluster )
– has own f...
Prepare – Provision Server(1/2)
● Develop tools & envrionment
– ex: gcc, glib, make/cmake, perl, etc...
● Install Perl mod...
Prepare – Provision Server(2/2)
● After execute ‘cpan’ command
– cpan 3> install Rex
– You may get fail!!
– This all story...
To Install Rex (1/3)
adduser brew-user
passwd brew-user
curl -L http://install.perlbrew.pl | bash
cd /home
chmod 755 brew-...
To Install Rex (2/3)
## In the brew-user account,
perlbrew init
perlbrew available
### Choose recommanded stable perl 5.18...
● cpanm Rex
● cpan
● http://rexify.org/get/
To Install Rex (3/3)
Test for Rex
[onycom@onydev: ~]$which rex
/home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex
[onycom@onydev: ~]$rex ...
/etc/hosts - Provision Server
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost l...
SSH connection
● Between
 Provision server and
other target servers
 Hadoop master node
and data nodes
[onycom@onydev: ~]$ ssh-keygen –t rsa
Enter file in which to save the key (/home/onycom/.ssh/id_rsa):
Created directory '/...
Create User
use Rex::Commands::User;
group "hadoop_node" => "vmaster", "vnode[0..2]" ;
group "all_vm_node" => "vmaster", "...
Setup SSH for user
desc "setup ssh for user";
task "setup_ssh_user", group => “all_vm_node”, sub {
run "mkdir /home/$USER/...
Install packages
parallelism 4;
desc "Install packages for java";
task "install_java", group => “all_vm_node”, sub {
insta...
Install hadoop(1/3)
user "hadoop-user";
private_key "/home/onycom/.ssh/id_rsa";
public_key "/home/onycom/.ssh/id_rsa.pub";...
Install hadoop(2/3)
desc "hadoop 1.1.2 download with wget";
task "get_hadoop", group=>"hadoop_node", sub {
my $f = run "wg...
Install hadoop(3/3)
my $HADOOP_SRC_DIR = "/home/hadoop-user/src";
desc "unzip hadoop source files";
task "unzip_src",group...
Configuration files(1/3)
● System
– /etc/hosts
● Hadoop(../hadoop/conf)
– masters & slave
– hadoop-env.sh
– hdfs-site.xml
...
Configuration files(2/3)
● Hadoop eco systems & other tools
– ex) Ganglia
– ex) Flume – agent/collector/master
– ex) Oozie...
Configuration files(3/3)
Provision
Server
Hadoop
NN
Hadoop
DN 1
Hadoop
DN n
Hadoop configuration files
(../hadoop_conf_rep...
Edit hosts file
my $target_file = “/etc/hosts”;
my $host_list =‘<<END’
192.168.2.51 vmaster
192.168.2.52 vnode0
192.168.2....
Setup .bashrc for user(1/2)
... skip .....
my $hadoop_rc=<<'END';
#Hadoop Configuration
export JAVA_HOME="/usr/lib/jvm/jre...
Setup .bashrc for user(2/2)
desc "setup hadoop-user's .rc file";
task "setup_rc_def", group=>"hadoop_node", sub {
my $fh =...
Configure Hadoop(1/6)
● ‘masters’
[hadoop-user@vmaster: ~]$cd hadoop/conf
[hadoop-user@vmaster: conf]$cat masters
vmaster
...
Configure Hadoop(2/6)
• hadoop-env.sh
... skip ...
The only required environment variable is JAVA_HOME. All others are
# o...
Configure Hadoop(3/6)
• hdfs-site.xml
... skip ...
<configuration>
<!-- modified by hadoop-user -->
<property>
<name>dfs.r...
Configure Hadoop(4/6)
• core-site.xml
... skip ...
<configuration>
<!--modified by hadoop-user -->
<property>
<name>fs.def...
Configure Hadoop(5/6)
• mapred-site.xml
.. skip ..
<property>
<name>mapred.job.tracker</name>
<value>vmaster:9001</value>
...
Configure Hadoop(6/6)
my $CNF_REPO="hadoop_conf_repo";
... skip ...
my $MAPRED="mapred-site.xml";
task "upload_mapred", gr...
Before going any further
● Stop selinux
– If it is enforcing
● modify policy of iptables
– I recommend to stop it while co...
Lets start hadoop
● login to master node with hadoop-user
– ssh –X hadoop-user@vmaster
● hadoop namenode format
– hadoop n...
Check hadoop status
[hadoop-user@vmaster: ~]$jps -l
22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
22260 o...
But, life is not easy
http://www.trulygraphics.com/tg/weekend/
Check status for all DNs
task "show_jps", "vnode[0..2]", sub {
say run "hostname";
my $r = run "jps";
say $r;
};
[onycom@o...
If there is some problem,
http://blog.lib.umn.edu/isss/undergraduate/2011/11/y
ou-do-have-any-tech-problem.html
● Check ag...
If you did not meet any
problems or fixed those,
Now you have hadoop
https://hadoopworld2011.eventbrite.com/
Automatic MGM/Prov. solution
yonhap
&
Advnaced Challenge
What more can we do?(1/2)
● add/remove data node
● add/remove storage
● Intergrate with monitoring
– ex: Ganglia/Nagios
● ...
What more can we do?(2/2)
● sophisticated hadoop paramer control
– ex: use XML parsing
● workflow control & batch
● backup...
Ref.
• http://hadoop.apache.org/
• http://pig.apache.org/
• http://hive.apache.org/
• http://confluence.openflamingo.org
•...
http://www.projects2crowdfund.com/what-can-i-do-with-
crowdfunding/
Thanksjunkim@onycom.com
/
rainmk6@gmail.com
Upcoming SlideShare
Loading in...5
×

Hadoop meet Rex(How to construct hadoop cluster with rex)

1,635

Published on

I'll attend open source seminar("open source framework for practical hadoop").
This is my presentation file.

Published in: Technology
2 Comments
9 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 2013.10.10(목) 세미나 발표자료 입니다.

    1. 하둡 시스템 설치 및 운영 소개
    http://www.slideshare.net/womendevel/hadoop-prov-rex
    2. 빅데이터 분석 알고리즘 소개 및 사례
    http://www.slideshare.net/womendevel/big-data-analytics-and-data-mining
    3. 빅데이터 분석 오픈소스를 활용한 데이터 분석 사례
    http://www.slideshare.net/womendevel/mapreduce-based
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,635
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
87
Comments
2
Likes
9
Embeds 0
No embeds

No notes for slide

Hadoop meet Rex(How to construct hadoop cluster with rex)

  1. 1. Hadoop meet (R)?ex - How to use Rexify for Hadoop cluster construct Original Rex base image http://rexify.org 2013-08-26 Original Hadoop image http://hadoop.apahce.org
  2. 2. Background
  3. 3. Mission • I’m not S/W developer any more • I’m not system engineer • But, I had to construct hadoop cluster – Moreover, in various types... http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate- Why-Mock-Me-Homer-Magnet-SM130.html
  4. 4. Hadoop is • The hadoop cluster is consist of many linux boxes • The hadoop has many configuration files and parameters • Besides hadoop, variety S/W of the hadoop eco system should be installed. • Except Hadoop & Hadoop eco, many types S/W should be installed & configured – Tomcat, apache, DBMS, other develop tools, other utils/libs… • And so on …
  5. 5. At first time, • I have did it manually – Install & Configure.. – Install & Configure – Install & Configure – Install & Configure – …. Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html
  6. 6. Tiresome !! • It is really tedious & horrible job !! Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
  7. 7. Find to other way • I decide to find other way!! • I’ve started to survey for other solutions Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html
  8. 8. Survey
  9. 9. Variety solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.cbsnews.com/8301-505125_162-31042083/duke- research-monkeys-like-humans-want-variety/
  10. 10. Hadoop Managers Hortonworks Management Center™ Clouder’s CDH™ * Apache Ambari
  11. 11. Provisioning Tools Fabric(Python)
  12. 12. Parallel SSH Tools http://dev.naver.com/projects/dist/ https://code.google.com/p/parallel-ssh/ http://sourceforge.net/projects/clusterssh/
  13. 13. Examination(1/3) • Hadoop Managers ↑ Specialized in the hadoop ↑ Aleardy confirmed ↑ Comportable ↓ Commercial or restrict license ↓ No support other App/libs, excluding Java/Hadoop/Hadoop Eco
  14. 14. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.bizbuilder.com/how-much-does-an-inexpensive- franchise-cost/  I have no money  I want to use more extra resource ※Recently, there are many changes in license policy. Please check it!!
  15. 15. Examination(2/3) • Other provisioning tools ↑ Powerful ↑ Many features ↑ Detailed control ↑ ↓ Complicatedness ↓ Need a lot of study
  16. 16. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools source :www.mbc.co.kr  I don’t like to study
  17. 17. Examination(3/3) • Other pararell ssh tools ↑ Simple ↑ Useful ↑ No need to install extra agent ↓ There are some insufficient features ↓ All exceptional cases are should be considered
  18. 18. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm  Yes, I’m a greedy
  19. 19. ● Simple & ● Powerful & ● No cost & ● Expandable & ● Smart way??? http://plug.hani.co.kr/heihei9999/459415 So, What is?
  20. 20. I have found solution
  21. 21. http://rexify.org/ It is Rex!!
  22. 22. ● uses just ssh ● no agent required ● seamless intergration ● no conflicts ● easy to use ● easy to extend ● easy to learn ● can use advanced perl’s power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over- and-let-holiday.html Rex is
  23. 23. Rex options [onycom@onydev: ~]$rex -h (R)?ex - (Remote)? Execution -b Run batch -e Run the given code fragment -E Execute task on the given environment -H Execute task on these hosts -G Execute task on these group -u Username for the ssh connection -p Password for the ssh connection -P Private Keyfile for the ssh connection -K Public Keyfile for the ssh connection -T List all known tasks. -Tv List all known tasks with all information. -f Use this file instead of Rexfile -h Display this help -M Load Module instead of Rexfile -v Display (R)?ex Version -F Force. Don't regard lock file -s Use sudo for every command -S Password for sudo -d Debug -dd More Debug (includes Profiling Output) -o Output Format -c Turn cache ON -C Turn cache OFF -q Quiet mode. No Logging output -Q Really quiet. Output nothing. -t Number of threads to use
  24. 24. Basic Gramma - Authentication From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  25. 25. Basic Gramma - Server Group From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  26. 26. Basic Gramma - Task From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  27. 27. Lets get down to the main subject!
  28. 28. Construct hadoop with (R)?ex
  29. 29. This presentaion is ● How to easy install & configure Hadoop – Not “How to optimize & performance tunning” ● To easy understanding, – exceptional cases are excluded ● No explain to OS installation – no discuss about “PXE /kicstart” ● Reduced environment conditions – ex) security, network, other servers/Apps, … ● I’ll not talk about perl language as possible – It is no needed ● TMTOWTDI – Even if it’s not refined, I’ll show variety way as possible
  30. 30. Network vmaster (Name node/ Job Tracker) L2 switch Onydev (Provision Server) vnode0 (Data node) vnode1 (Data node) vnode2 (Data node) vmonitor (Monitoring Server) Topology [spec]  Machine : 6 ea (hadoop has just 4 ea)  OS : CentOS 6.4 64bit  Memory : 32GB(NN) 16GB(DN)  CPU : 4 core(i7, 3.5GHz)  Interface : 1G Ethernet  Disk : 250G SDD 1T HDD ※ I’ve configured NN and JT on the same machine
  31. 31. Our hadoop Env. is ● There is one control account – ‘hadoop-user’ ● hadoop & hadoop eco is installed in ‘hadoop-user’ account
  32. 32. Prepare – All machines ● On the each machine, – same OS version would be installed (at least, hadoop cluster ) – has own fixed IP address – can be connect with SSH – has one more normal user account & it’s sudoe rs edit work (just optional)
  33. 33. Prepare – Provision Server(1/2) ● Develop tools & envrionment – ex: gcc, glib, make/cmake, perl, etc... ● Install Perl modules – yum install perl-ExtUtil* – yum install perl-CPAN* – excute ‘cpan’ command
  34. 34. Prepare – Provision Server(2/2) ● After execute ‘cpan’ command – cpan 3> install Rex – You may get fail!! – This all story is based on the CentOS 6.XX ● So, I recommend ‘perl brew’ – If you want to use more perl power ※In my guess, redhat may dislike perl language
  35. 35. To Install Rex (1/3) adduser brew-user passwd brew-user curl -L http://install.perlbrew.pl | bash cd /home chmod 755 brew-user cd ~brew-user chmod -R 755 ./perl5 echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc ##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc source ~brew-user/.bashrc
  36. 36. To Install Rex (2/3) ## In the brew-user account, perlbrew init perlbrew available ### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11) perlbrew install perl-5.18.0 perlbrew switch perl-5.18.0 [brew-user@onydev: ~]$perlbrew switch perl-5.18.0 Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34. ......... A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.
  37. 37. ● cpanm Rex ● cpan ● http://rexify.org/get/ To Install Rex (3/3)
  38. 38. Test for Rex [onycom@onydev: ~]$which rex /home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex [onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'" [2013-10-08 15:36:06] INFO - Running task eval-line on localhost [2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom) [2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate. [2013-10-08 15:36:07] INFO - Successfully authenticated on localhost. onydev [onycom@onydev: ~]$ ● Rexfile ● plain text file
  39. 39. /etc/hosts - Provision Server 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ... skip ................. 192.168.2.100 onydev ... skip ................. 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor ~
  40. 40. SSH connection ● Between  Provision server and other target servers  Hadoop master node and data nodes
  41. 41. [onycom@onydev: ~]$ ssh-keygen –t rsa Enter file in which to save the key (/home/onycom/.ssh/id_rsa): Created directory '/home/onycom/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/onycom/.ssh/id_rsa. Your public key has been saved in /home/tasha/.ssh/id_rsa.pub. Prepare SSH public key
  42. 42. Create User use Rex::Commands::User; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; my $USER = “hadoop-user”; desc "Create user"; task "new_user", group => “all_vm_node”, sub { create_user “$USER", home => "/home/$USER", comment=>"Account for _hadoop", password => "blabla", }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user
  43. 43. Setup SSH for user desc "setup ssh for user"; task "setup_ssh_user", group => “all_vm_node”, sub { run "mkdir /home/$USER/.ssh"; file "/home/$USER/.ssh/authorized_keys", source => "/home/onycom/.ssh/id_rsa.pub", owner => "$USER", group => "$USER", mode => 644; run "chmod 700 /home/$USER/.ssh"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user ※ Ok!! Done. Now you can login to each servers without password Then, do same thing for hadoop NN/DN nodes.
  44. 44. Install packages parallelism 4; desc "Install packages for java"; task "install_java", group => “all_vm_node”, sub { install package => “java-1.6.*"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java • Some packages are should be installed globaly(ex: java, wget, etc) • For the hadoop 1.1.x, java 1.6 is recommanded. • use parallelism keyword (if long time is required)
  45. 45. Install hadoop(1/3) user "hadoop-user"; private_key "/home/onycom/.ssh/id_rsa"; public_key "/home/onycom/.ssh/id_rsa.pub"; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; desc "prepare_dir"; task "prepare_dir", group=>"hadoop_node", sub { run "mkdir Work"; run "mkdir Download"; run "mkdir src“; run “mkdir tmp”; }; hd1.Rexfile onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir
  46. 46. Install hadoop(2/3) desc "hadoop 1.1.2 download with wget"; task "get_hadoop", group=>"hadoop_node", sub { my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop- 1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ...skip.... desc "pig 0.11.1 download with wget"; task "get_pig", group=>"hadoop_node", sub { my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ! hadoop ver. & hadoop eco s/w ver. should be matched This topic is get off the subject on this presentation
  47. 47. Install hadoop(3/3) my $HADOOP_SRC_DIR = "/home/hadoop-user/src"; desc "unzip hadoop source files"; task "unzip_src",group=>"hadoop_node", sub { run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR"; }; desc "make link for hadoop source files"; task "link_src", group=>"hadoop_node", sub { run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR; run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR; run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR; };
  48. 48. Configuration files(1/3) ● System – /etc/hosts ● Hadoop(../hadoop/conf) – masters & slave – hadoop-env.sh – hdfs-site.xml – core-site.xml – mapred-site.xml
  49. 49. Configuration files(2/3) ● Hadoop eco systems & other tools – ex) Ganglia – ex) Flume – agent/collector/master – ex) Oozie or flamingo – Skip these on this PPT. ● User rc file  These are just default & no consider optimization
  50. 50. Configuration files(3/3) Provision Server Hadoop NN Hadoop DN 1 Hadoop DN n Hadoop configuration files (../hadoop_conf_repo) SSH/SCP (R)ex ※ Of course, this is just my policy
  51. 51. Edit hosts file my $target_file = “/etc/hosts”; my $host_list =‘<<END’ 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor END desc "Add hosts"; task "add_host", group => “all_vm_node", sub { my $exist_cnt = cat $target_file; my $fh = file_write $target_file; $fh->write( $exist_cnt ); $fh->write($host_list); $fh->close; }; ※ You can consider ‘Augeas tool’ to handle system files. Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’
  52. 52. Setup .bashrc for user(1/2) ... skip ..... my $hadoop_rc=<<'END'; #Hadoop Configuration export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64" export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext" export HADOOP_USER="/home/hadoop-user" export HADOOP_SRC="$HADOOP_USER/src" export HADOOP_HOME="$HADOOP_USER/hadoop" export PIG_HOME="$HADOOP_SRC/pig" export HIVE_HOME="$HADOOP_SRC/hive" END ... skip .....
  53. 53. Setup .bashrc for user(2/2) desc "setup hadoop-user's .rc file"; task "setup_rc_def", group=>"hadoop_node", sub { my $fh = file_append ".bashrc"; $fh->write($base_rc); $fh->write($hadoop_rc); $fh->close(); }; desc "setup hadoop master node .rc file"; task "setup_rc_master", "vmaster", sub { my $fh = file_append ".bashrc"; $fh->write($master_rc); $fh->close(); }; .......... skip ............
  54. 54. Configure Hadoop(1/6) ● ‘masters’ [hadoop-user@vmaster: ~]$cd hadoop/conf [hadoop-user@vmaster: conf]$cat masters vmaster ● ‘slaves’ [hadoop-user@vmaster: conf]$cat slaves vnode0 vnode1 vnode2
  55. 55. Configure Hadoop(2/6) • hadoop-env.sh ... skip ... The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 #hadoop-user #Remove warring message for "HADOOP_HOME" is deprecated export HADOOP_HOME_WARN_SUPPRESS=TRUE
  56. 56. Configure Hadoop(3/6) • hdfs-site.xml ... skip ... <configuration> <!-- modified by hadoop-user --> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop-user/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop-user/hdfs/data</value> </property> </configuration> ※ This ‘replication’ value is depend on our env.
  57. 57. Configure Hadoop(4/6) • core-site.xml ... skip ... <configuration> <!--modified by hadoop-user --> <property> <name>fs.default.name</name> <value>hdfs://vmaster:9000</value> </property> </configuration>
  58. 58. Configure Hadoop(5/6) • mapred-site.xml .. skip .. <property> <name>mapred.job.tracker</name> <value>vmaster:9001</value> </property> <!-- 2013.9.11. Increse the setting timeout for fail to report status error --> <property> <name>mapred.task.timeout</name> <value>1800000</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property> ※ This ‘timeout’ value is just depend on our env.
  59. 59. Configure Hadoop(6/6) my $CNF_REPO="hadoop_conf_repo"; ... skip ... my $MAPRED="mapred-site.xml"; task "upload_mapred", group=>"hadoop_node", sub { file "$HD_CNF/$MAPRED", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$MAPRED"; }; my $CORE_SITE="core-site.xml"; task "upload_core", group=>"hadoop_node", sub { file "$HD_CNF/$CORE_SITE", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$CORE_SITE"; }; ... skip ....
  60. 60. Before going any further ● Stop selinux – If it is enforcing ● modify policy of iptables – I recommend to stop it while configure working
  61. 61. Lets start hadoop ● login to master node with hadoop-user – ssh –X hadoop-user@vmaster ● hadoop namenode format – hadoop namenode format ● execute start script – ex) start-all.sh
  62. 62. Check hadoop status [hadoop-user@vmaster: ~]$jps -l 22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22260 org.apache.hadoop.mapred.JobTracker 21968 org.apache.hadoop.hdfs.server.namenode.NameNode 27896 sun.tools.jps.Jps [hadoop-user@vmaster: ~]$hadoop fs -ls / Found 1 items drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp ※ It seems to be OK. Really?
  63. 63. But, life is not easy http://www.trulygraphics.com/tg/weekend/
  64. 64. Check status for all DNs task "show_jps", "vnode[0..2]", sub { say run "hostname"; my $r = run "jps"; say $r; }; [onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps vnode0 12682 Jps 12042 TaskTracker 11934 DataNode vnode1 11669 DataNode 11778 TaskTracker 12438 Jps vnode2 11128 DataNode 11237 TaskTracker 11895 Jps
  65. 65. If there is some problem, http://blog.lib.umn.edu/isss/undergraduate/2011/11/y ou-do-have-any-tech-problem.html ● Check again – /etc/hosts – selinux & iptables – name & data dir./permission in hdfs – and so on... (on the each node)
  66. 66. If you did not meet any problems or fixed those,
  67. 67. Now you have hadoop https://hadoopworld2011.eventbrite.com/ Automatic MGM/Prov. solution yonhap &
  68. 68. Advnaced Challenge
  69. 69. What more can we do?(1/2) ● add/remove data node ● add/remove storage ● Intergrate with monitoring – ex: Ganglia/Nagios ● Intergrate with other hadoop eco – Flume, flamingo, Oozie ● Intergrate other device or server – ex: Switch, DB server
  70. 70. What more can we do?(2/2) ● sophisticated hadoop paramer control – ex: use XML parsing ● workflow control & batch ● backup ● periodic file system management – ex: log files ● web GUI ● make a framework for your purpose
  71. 71. Ref. • http://hadoop.apache.org/ • http://pig.apache.org/ • http://hive.apache.org/ • http://confluence.openflamingo.org • http://www.openankus.org • http://www.rexify.org • https://groups.google.com/forum/#!forum/re x-users • http://modules.rexify.org/search?q=hadoop
  72. 72. http://www.projects2crowdfund.com/what-can-i-do-with- crowdfunding/
  73. 73. Thanksjunkim@onycom.com / rainmk6@gmail.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×