Hadoop meet Rex(How to construct hadoop cluster with rex)

2,104 views
1,911 views

Published on

I'll attend open source seminar("open source framework for practical hadoop").
This is my presentation file.

Published in: Technology
2 Comments
9 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 2013.10.10(목) 세미나 발표자료 입니다.

    1. 하둡 시스템 설치 및 운영 소개
    http://www.slideshare.net/womendevel/hadoop-prov-rex
    2. 빅데이터 분석 알고리즘 소개 및 사례
    http://www.slideshare.net/womendevel/big-data-analytics-and-data-mining
    3. 빅데이터 분석 오픈소스를 활용한 데이터 분석 사례
    http://www.slideshare.net/womendevel/mapreduce-based
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,104
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
91
Comments
2
Likes
9
Embeds 0
No embeds

No notes for slide

Hadoop meet Rex(How to construct hadoop cluster with rex)

  1. 1. Hadoop meet (R)?ex - How to use Rexify for Hadoop cluster construct Original Rex base image http://rexify.org 2013-08-26 Original Hadoop image http://hadoop.apahce.org
  2. 2. Background
  3. 3. Mission • I’m not S/W developer any more • I’m not system engineer • But, I had to construct hadoop cluster – Moreover, in various types... http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate- Why-Mock-Me-Homer-Magnet-SM130.html
  4. 4. Hadoop is • The hadoop cluster is consist of many linux boxes • The hadoop has many configuration files and parameters • Besides hadoop, variety S/W of the hadoop eco system should be installed. • Except Hadoop & Hadoop eco, many types S/W should be installed & configured – Tomcat, apache, DBMS, other develop tools, other utils/libs… • And so on …
  5. 5. At first time, • I have did it manually – Install & Configure.. – Install & Configure – Install & Configure – Install & Configure – …. Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html
  6. 6. Tiresome !! • It is really tedious & horrible job !! Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
  7. 7. Find to other way • I decide to find other way!! • I’ve started to survey for other solutions Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html
  8. 8. Survey
  9. 9. Variety solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.cbsnews.com/8301-505125_162-31042083/duke- research-monkeys-like-humans-want-variety/
  10. 10. Hadoop Managers Hortonworks Management Center™ Clouder’s CDH™ * Apache Ambari
  11. 11. Provisioning Tools Fabric(Python)
  12. 12. Parallel SSH Tools http://dev.naver.com/projects/dist/ https://code.google.com/p/parallel-ssh/ http://sourceforge.net/projects/clusterssh/
  13. 13. Examination(1/3) • Hadoop Managers ↑ Specialized in the hadoop ↑ Aleardy confirmed ↑ Comportable ↓ Commercial or restrict license ↓ No support other App/libs, excluding Java/Hadoop/Hadoop Eco
  14. 14. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.bizbuilder.com/how-much-does-an-inexpensive- franchise-cost/  I have no money  I want to use more extra resource ※Recently, there are many changes in license policy. Please check it!!
  15. 15. Examination(2/3) • Other provisioning tools ↑ Powerful ↑ Many features ↑ Detailed control ↑ ↓ Complicatedness ↓ Need a lot of study
  16. 16. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools source :www.mbc.co.kr  I don’t like to study
  17. 17. Examination(3/3) • Other pararell ssh tools ↑ Simple ↑ Useful ↑ No need to install extra agent ↓ There are some insufficient features ↓ All exceptional cases are should be considered
  18. 18. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm  Yes, I’m a greedy
  19. 19. ● Simple & ● Powerful & ● No cost & ● Expandable & ● Smart way??? http://plug.hani.co.kr/heihei9999/459415 So, What is?
  20. 20. I have found solution
  21. 21. http://rexify.org/ It is Rex!!
  22. 22. ● uses just ssh ● no agent required ● seamless intergration ● no conflicts ● easy to use ● easy to extend ● easy to learn ● can use advanced perl’s power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over- and-let-holiday.html Rex is
  23. 23. Rex options [onycom@onydev: ~]$rex -h (R)?ex - (Remote)? Execution -b Run batch -e Run the given code fragment -E Execute task on the given environment -H Execute task on these hosts -G Execute task on these group -u Username for the ssh connection -p Password for the ssh connection -P Private Keyfile for the ssh connection -K Public Keyfile for the ssh connection -T List all known tasks. -Tv List all known tasks with all information. -f Use this file instead of Rexfile -h Display this help -M Load Module instead of Rexfile -v Display (R)?ex Version -F Force. Don't regard lock file -s Use sudo for every command -S Password for sudo -d Debug -dd More Debug (includes Profiling Output) -o Output Format -c Turn cache ON -C Turn cache OFF -q Quiet mode. No Logging output -Q Really quiet. Output nothing. -t Number of threads to use
  24. 24. Basic Gramma - Authentication From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  25. 25. Basic Gramma - Server Group From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  26. 26. Basic Gramma - Task From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  27. 27. Lets get down to the main subject!
  28. 28. Construct hadoop with (R)?ex
  29. 29. This presentaion is ● How to easy install & configure Hadoop – Not “How to optimize & performance tunning” ● To easy understanding, – exceptional cases are excluded ● No explain to OS installation – no discuss about “PXE /kicstart” ● Reduced environment conditions – ex) security, network, other servers/Apps, … ● I’ll not talk about perl language as possible – It is no needed ● TMTOWTDI – Even if it’s not refined, I’ll show variety way as possible
  30. 30. Network vmaster (Name node/ Job Tracker) L2 switch Onydev (Provision Server) vnode0 (Data node) vnode1 (Data node) vnode2 (Data node) vmonitor (Monitoring Server) Topology [spec]  Machine : 6 ea (hadoop has just 4 ea)  OS : CentOS 6.4 64bit  Memory : 32GB(NN) 16GB(DN)  CPU : 4 core(i7, 3.5GHz)  Interface : 1G Ethernet  Disk : 250G SDD 1T HDD ※ I’ve configured NN and JT on the same machine
  31. 31. Our hadoop Env. is ● There is one control account – ‘hadoop-user’ ● hadoop & hadoop eco is installed in ‘hadoop-user’ account
  32. 32. Prepare – All machines ● On the each machine, – same OS version would be installed (at least, hadoop cluster ) – has own fixed IP address – can be connect with SSH – has one more normal user account & it’s sudoe rs edit work (just optional)
  33. 33. Prepare – Provision Server(1/2) ● Develop tools & envrionment – ex: gcc, glib, make/cmake, perl, etc... ● Install Perl modules – yum install perl-ExtUtil* – yum install perl-CPAN* – excute ‘cpan’ command
  34. 34. Prepare – Provision Server(2/2) ● After execute ‘cpan’ command – cpan 3> install Rex – You may get fail!! – This all story is based on the CentOS 6.XX ● So, I recommend ‘perl brew’ – If you want to use more perl power ※In my guess, redhat may dislike perl language
  35. 35. To Install Rex (1/3) adduser brew-user passwd brew-user curl -L http://install.perlbrew.pl | bash cd /home chmod 755 brew-user cd ~brew-user chmod -R 755 ./perl5 echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc ##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc source ~brew-user/.bashrc
  36. 36. To Install Rex (2/3) ## In the brew-user account, perlbrew init perlbrew available ### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11) perlbrew install perl-5.18.0 perlbrew switch perl-5.18.0 [brew-user@onydev: ~]$perlbrew switch perl-5.18.0 Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34. ......... A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.
  37. 37. ● cpanm Rex ● cpan ● http://rexify.org/get/ To Install Rex (3/3)
  38. 38. Test for Rex [onycom@onydev: ~]$which rex /home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex [onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'" [2013-10-08 15:36:06] INFO - Running task eval-line on localhost [2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom) [2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate. [2013-10-08 15:36:07] INFO - Successfully authenticated on localhost. onydev [onycom@onydev: ~]$ ● Rexfile ● plain text file
  39. 39. /etc/hosts - Provision Server 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ... skip ................. 192.168.2.100 onydev ... skip ................. 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor ~
  40. 40. SSH connection ● Between  Provision server and other target servers  Hadoop master node and data nodes
  41. 41. [onycom@onydev: ~]$ ssh-keygen –t rsa Enter file in which to save the key (/home/onycom/.ssh/id_rsa): Created directory '/home/onycom/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/onycom/.ssh/id_rsa. Your public key has been saved in /home/tasha/.ssh/id_rsa.pub. Prepare SSH public key
  42. 42. Create User use Rex::Commands::User; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; my $USER = “hadoop-user”; desc "Create user"; task "new_user", group => “all_vm_node”, sub { create_user “$USER", home => "/home/$USER", comment=>"Account for _hadoop", password => "blabla", }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user
  43. 43. Setup SSH for user desc "setup ssh for user"; task "setup_ssh_user", group => “all_vm_node”, sub { run "mkdir /home/$USER/.ssh"; file "/home/$USER/.ssh/authorized_keys", source => "/home/onycom/.ssh/id_rsa.pub", owner => "$USER", group => "$USER", mode => 644; run "chmod 700 /home/$USER/.ssh"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user ※ Ok!! Done. Now you can login to each servers without password Then, do same thing for hadoop NN/DN nodes.
  44. 44. Install packages parallelism 4; desc "Install packages for java"; task "install_java", group => “all_vm_node”, sub { install package => “java-1.6.*"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java • Some packages are should be installed globaly(ex: java, wget, etc) • For the hadoop 1.1.x, java 1.6 is recommanded. • use parallelism keyword (if long time is required)
  45. 45. Install hadoop(1/3) user "hadoop-user"; private_key "/home/onycom/.ssh/id_rsa"; public_key "/home/onycom/.ssh/id_rsa.pub"; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; desc "prepare_dir"; task "prepare_dir", group=>"hadoop_node", sub { run "mkdir Work"; run "mkdir Download"; run "mkdir src“; run “mkdir tmp”; }; hd1.Rexfile onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir
  46. 46. Install hadoop(2/3) desc "hadoop 1.1.2 download with wget"; task "get_hadoop", group=>"hadoop_node", sub { my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop- 1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ...skip.... desc "pig 0.11.1 download with wget"; task "get_pig", group=>"hadoop_node", sub { my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ! hadoop ver. & hadoop eco s/w ver. should be matched This topic is get off the subject on this presentation
  47. 47. Install hadoop(3/3) my $HADOOP_SRC_DIR = "/home/hadoop-user/src"; desc "unzip hadoop source files"; task "unzip_src",group=>"hadoop_node", sub { run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR"; }; desc "make link for hadoop source files"; task "link_src", group=>"hadoop_node", sub { run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR; run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR; run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR; };
  48. 48. Configuration files(1/3) ● System – /etc/hosts ● Hadoop(../hadoop/conf) – masters & slave – hadoop-env.sh – hdfs-site.xml – core-site.xml – mapred-site.xml
  49. 49. Configuration files(2/3) ● Hadoop eco systems & other tools – ex) Ganglia – ex) Flume – agent/collector/master – ex) Oozie or flamingo – Skip these on this PPT. ● User rc file  These are just default & no consider optimization
  50. 50. Configuration files(3/3) Provision Server Hadoop NN Hadoop DN 1 Hadoop DN n Hadoop configuration files (../hadoop_conf_repo) SSH/SCP (R)ex ※ Of course, this is just my policy
  51. 51. Edit hosts file my $target_file = “/etc/hosts”; my $host_list =‘<<END’ 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor END desc "Add hosts"; task "add_host", group => “all_vm_node", sub { my $exist_cnt = cat $target_file; my $fh = file_write $target_file; $fh->write( $exist_cnt ); $fh->write($host_list); $fh->close; }; ※ You can consider ‘Augeas tool’ to handle system files. Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’
  52. 52. Setup .bashrc for user(1/2) ... skip ..... my $hadoop_rc=<<'END'; #Hadoop Configuration export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64" export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext" export HADOOP_USER="/home/hadoop-user" export HADOOP_SRC="$HADOOP_USER/src" export HADOOP_HOME="$HADOOP_USER/hadoop" export PIG_HOME="$HADOOP_SRC/pig" export HIVE_HOME="$HADOOP_SRC/hive" END ... skip .....
  53. 53. Setup .bashrc for user(2/2) desc "setup hadoop-user's .rc file"; task "setup_rc_def", group=>"hadoop_node", sub { my $fh = file_append ".bashrc"; $fh->write($base_rc); $fh->write($hadoop_rc); $fh->close(); }; desc "setup hadoop master node .rc file"; task "setup_rc_master", "vmaster", sub { my $fh = file_append ".bashrc"; $fh->write($master_rc); $fh->close(); }; .......... skip ............
  54. 54. Configure Hadoop(1/6) ● ‘masters’ [hadoop-user@vmaster: ~]$cd hadoop/conf [hadoop-user@vmaster: conf]$cat masters vmaster ● ‘slaves’ [hadoop-user@vmaster: conf]$cat slaves vnode0 vnode1 vnode2
  55. 55. Configure Hadoop(2/6) • hadoop-env.sh ... skip ... The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 #hadoop-user #Remove warring message for "HADOOP_HOME" is deprecated export HADOOP_HOME_WARN_SUPPRESS=TRUE
  56. 56. Configure Hadoop(3/6) • hdfs-site.xml ... skip ... <configuration> <!-- modified by hadoop-user --> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop-user/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop-user/hdfs/data</value> </property> </configuration> ※ This ‘replication’ value is depend on our env.
  57. 57. Configure Hadoop(4/6) • core-site.xml ... skip ... <configuration> <!--modified by hadoop-user --> <property> <name>fs.default.name</name> <value>hdfs://vmaster:9000</value> </property> </configuration>
  58. 58. Configure Hadoop(5/6) • mapred-site.xml .. skip .. <property> <name>mapred.job.tracker</name> <value>vmaster:9001</value> </property> <!-- 2013.9.11. Increse the setting timeout for fail to report status error --> <property> <name>mapred.task.timeout</name> <value>1800000</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property> ※ This ‘timeout’ value is just depend on our env.
  59. 59. Configure Hadoop(6/6) my $CNF_REPO="hadoop_conf_repo"; ... skip ... my $MAPRED="mapred-site.xml"; task "upload_mapred", group=>"hadoop_node", sub { file "$HD_CNF/$MAPRED", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$MAPRED"; }; my $CORE_SITE="core-site.xml"; task "upload_core", group=>"hadoop_node", sub { file "$HD_CNF/$CORE_SITE", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$CORE_SITE"; }; ... skip ....
  60. 60. Before going any further ● Stop selinux – If it is enforcing ● modify policy of iptables – I recommend to stop it while configure working
  61. 61. Lets start hadoop ● login to master node with hadoop-user – ssh –X hadoop-user@vmaster ● hadoop namenode format – hadoop namenode format ● execute start script – ex) start-all.sh
  62. 62. Check hadoop status [hadoop-user@vmaster: ~]$jps -l 22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22260 org.apache.hadoop.mapred.JobTracker 21968 org.apache.hadoop.hdfs.server.namenode.NameNode 27896 sun.tools.jps.Jps [hadoop-user@vmaster: ~]$hadoop fs -ls / Found 1 items drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp ※ It seems to be OK. Really?
  63. 63. But, life is not easy http://www.trulygraphics.com/tg/weekend/
  64. 64. Check status for all DNs task "show_jps", "vnode[0..2]", sub { say run "hostname"; my $r = run "jps"; say $r; }; [onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps vnode0 12682 Jps 12042 TaskTracker 11934 DataNode vnode1 11669 DataNode 11778 TaskTracker 12438 Jps vnode2 11128 DataNode 11237 TaskTracker 11895 Jps
  65. 65. If there is some problem, http://blog.lib.umn.edu/isss/undergraduate/2011/11/y ou-do-have-any-tech-problem.html ● Check again – /etc/hosts – selinux & iptables – name & data dir./permission in hdfs – and so on... (on the each node)
  66. 66. If you did not meet any problems or fixed those,
  67. 67. Now you have hadoop https://hadoopworld2011.eventbrite.com/ Automatic MGM/Prov. solution yonhap &
  68. 68. Advnaced Challenge
  69. 69. What more can we do?(1/2) ● add/remove data node ● add/remove storage ● Intergrate with monitoring – ex: Ganglia/Nagios ● Intergrate with other hadoop eco – Flume, flamingo, Oozie ● Intergrate other device or server – ex: Switch, DB server
  70. 70. What more can we do?(2/2) ● sophisticated hadoop paramer control – ex: use XML parsing ● workflow control & batch ● backup ● periodic file system management – ex: log files ● web GUI ● make a framework for your purpose
  71. 71. Ref. • http://hadoop.apache.org/ • http://pig.apache.org/ • http://hive.apache.org/ • http://confluence.openflamingo.org • http://www.openankus.org • http://www.rexify.org • https://groups.google.com/forum/#!forum/re x-users • http://modules.rexify.org/search?q=hadoop
  72. 72. http://www.projects2crowdfund.com/what-can-i-do-with- crowdfunding/
  73. 73. Thanksjunkim@onycom.com / rainmk6@gmail.com

×