• Like
Hadoop meet Rex(How to construct hadoop cluster with rex)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Hadoop meet Rex(How to construct hadoop cluster with rex)

  • 1,343 views
Published

I'll attend open source seminar("open source framework for practical hadoop"). …

I'll attend open source seminar("open source framework for practical hadoop").
This is my presentation file.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • 2013.10.10(목) 세미나 발표자료 입니다.

    1. 하둡 시스템 설치 및 운영 소개
    http://www.slideshare.net/womendevel/hadoop-prov-rex
    2. 빅데이터 분석 알고리즘 소개 및 사례
    http://www.slideshare.net/womendevel/big-data-analytics-and-data-mining
    3. 빅데이터 분석 오픈소스를 활용한 데이터 분석 사례
    http://www.slideshare.net/womendevel/mapreduce-based
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,343
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
66
Comments
1
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop meet (R)?ex - How to use Rexify for Hadoop cluster construct Original Rex base image http://rexify.org 2013-08-26 Original Hadoop image http://hadoop.apahce.org
  • 2. Background
  • 3. Mission • I’m not S/W developer any more • I’m not system engineer • But, I had to construct hadoop cluster – Moreover, in various types... http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate- Why-Mock-Me-Homer-Magnet-SM130.html
  • 4. Hadoop is • The hadoop cluster is consist of many linux boxes • The hadoop has many configuration files and parameters • Besides hadoop, variety S/W of the hadoop eco system should be installed. • Except Hadoop & Hadoop eco, many types S/W should be installed & configured – Tomcat, apache, DBMS, other develop tools, other utils/libs… • And so on …
  • 5. At first time, • I have did it manually – Install & Configure.. – Install & Configure – Install & Configure – Install & Configure – …. Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html
  • 6. Tiresome !! • It is really tedious & horrible job !! Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
  • 7. Find to other way • I decide to find other way!! • I’ve started to survey for other solutions Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html
  • 8. Survey
  • 9. Variety solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.cbsnews.com/8301-505125_162-31042083/duke- research-monkeys-like-humans-want-variety/
  • 10. Hadoop Managers Hortonworks Management Center™ Clouder’s CDH™ * Apache Ambari
  • 11. Provisioning Tools Fabric(Python)
  • 12. Parallel SSH Tools http://dev.naver.com/projects/dist/ https://code.google.com/p/parallel-ssh/ http://sourceforge.net/projects/clusterssh/
  • 13. Examination(1/3) • Hadoop Managers ↑ Specialized in the hadoop ↑ Aleardy confirmed ↑ Comportable ↓ Commercial or restrict license ↓ No support other App/libs, excluding Java/Hadoop/Hadoop Eco
  • 14. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.bizbuilder.com/how-much-does-an-inexpensive- franchise-cost/  I have no money  I want to use more extra resource ※Recently, there are many changes in license policy. Please check it!!
  • 15. Examination(2/3) • Other provisioning tools ↑ Powerful ↑ Many features ↑ Detailed control ↑ ↓ Complicatedness ↓ Need a lot of study
  • 16. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools source :www.mbc.co.kr  I don’t like to study
  • 17. Examination(3/3) • Other pararell ssh tools ↑ Simple ↑ Useful ↑ No need to install extra agent ↓ There are some insufficient features ↓ All exceptional cases are should be considered
  • 18. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm  Yes, I’m a greedy
  • 19. ● Simple & ● Powerful & ● No cost & ● Expandable & ● Smart way??? http://plug.hani.co.kr/heihei9999/459415 So, What is?
  • 20. I have found solution
  • 21. http://rexify.org/ It is Rex!!
  • 22. ● uses just ssh ● no agent required ● seamless intergration ● no conflicts ● easy to use ● easy to extend ● easy to learn ● can use advanced perl’s power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over- and-let-holiday.html Rex is
  • 23. Rex options [onycom@onydev: ~]$rex -h (R)?ex - (Remote)? Execution -b Run batch -e Run the given code fragment -E Execute task on the given environment -H Execute task on these hosts -G Execute task on these group -u Username for the ssh connection -p Password for the ssh connection -P Private Keyfile for the ssh connection -K Public Keyfile for the ssh connection -T List all known tasks. -Tv List all known tasks with all information. -f Use this file instead of Rexfile -h Display this help -M Load Module instead of Rexfile -v Display (R)?ex Version -F Force. Don't regard lock file -s Use sudo for every command -S Password for sudo -d Debug -dd More Debug (includes Profiling Output) -o Output Format -c Turn cache ON -C Turn cache OFF -q Quiet mode. No Logging output -Q Really quiet. Output nothing. -t Number of threads to use
  • 24. Basic Gramma - Authentication From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • 25. Basic Gramma - Server Group From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • 26. Basic Gramma - Task From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • 27. Lets get down to the main subject!
  • 28. Construct hadoop with (R)?ex
  • 29. This presentaion is ● How to easy install & configure Hadoop – Not “How to optimize & performance tunning” ● To easy understanding, – exceptional cases are excluded ● No explain to OS installation – no discuss about “PXE /kicstart” ● Reduced environment conditions – ex) security, network, other servers/Apps, … ● I’ll not talk about perl language as possible – It is no needed ● TMTOWTDI – Even if it’s not refined, I’ll show variety way as possible
  • 30. Network vmaster (Name node/ Job Tracker) L2 switch Onydev (Provision Server) vnode0 (Data node) vnode1 (Data node) vnode2 (Data node) vmonitor (Monitoring Server) Topology [spec]  Machine : 6 ea (hadoop has just 4 ea)  OS : CentOS 6.4 64bit  Memory : 32GB(NN) 16GB(DN)  CPU : 4 core(i7, 3.5GHz)  Interface : 1G Ethernet  Disk : 250G SDD 1T HDD ※ I’ve configured NN and JT on the same machine
  • 31. Our hadoop Env. is ● There is one control account – ‘hadoop-user’ ● hadoop & hadoop eco is installed in ‘hadoop-user’ account
  • 32. Prepare – All machines ● On the each machine, – same OS version would be installed (at least, hadoop cluster ) – has own fixed IP address – can be connect with SSH – has one more normal user account & it’s sudoe rs edit work (just optional)
  • 33. Prepare – Provision Server(1/2) ● Develop tools & envrionment – ex: gcc, glib, make/cmake, perl, etc... ● Install Perl modules – yum install perl-ExtUtil* – yum install perl-CPAN* – excute ‘cpan’ command
  • 34. Prepare – Provision Server(2/2) ● After execute ‘cpan’ command – cpan 3> install Rex – You may get fail!! – This all story is based on the CentOS 6.XX ● So, I recommend ‘perl brew’ – If you want to use more perl power ※In my guess, redhat may dislike perl language
  • 35. To Install Rex (1/3) adduser brew-user passwd brew-user curl -L http://install.perlbrew.pl | bash cd /home chmod 755 brew-user cd ~brew-user chmod -R 755 ./perl5 echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc ##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc source ~brew-user/.bashrc
  • 36. To Install Rex (2/3) ## In the brew-user account, perlbrew init perlbrew available ### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11) perlbrew install perl-5.18.0 perlbrew switch perl-5.18.0 [brew-user@onydev: ~]$perlbrew switch perl-5.18.0 Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34. ......... A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.
  • 37. ● cpanm Rex ● cpan ● http://rexify.org/get/ To Install Rex (3/3)
  • 38. Test for Rex [onycom@onydev: ~]$which rex /home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex [onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'" [2013-10-08 15:36:06] INFO - Running task eval-line on localhost [2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom) [2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate. [2013-10-08 15:36:07] INFO - Successfully authenticated on localhost. onydev [onycom@onydev: ~]$ ● Rexfile ● plain text file
  • 39. /etc/hosts - Provision Server 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ... skip ................. 192.168.2.100 onydev ... skip ................. 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor ~
  • 40. SSH connection ● Between  Provision server and other target servers  Hadoop master node and data nodes
  • 41. [onycom@onydev: ~]$ ssh-keygen –t rsa Enter file in which to save the key (/home/onycom/.ssh/id_rsa): Created directory '/home/onycom/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/onycom/.ssh/id_rsa. Your public key has been saved in /home/tasha/.ssh/id_rsa.pub. Prepare SSH public key
  • 42. Create User use Rex::Commands::User; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; my $USER = “hadoop-user”; desc "Create user"; task "new_user", group => “all_vm_node”, sub { create_user “$USER", home => "/home/$USER", comment=>"Account for _hadoop", password => "blabla", }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user
  • 43. Setup SSH for user desc "setup ssh for user"; task "setup_ssh_user", group => “all_vm_node”, sub { run "mkdir /home/$USER/.ssh"; file "/home/$USER/.ssh/authorized_keys", source => "/home/onycom/.ssh/id_rsa.pub", owner => "$USER", group => "$USER", mode => 644; run "chmod 700 /home/$USER/.ssh"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user ※ Ok!! Done. Now you can login to each servers without password Then, do same thing for hadoop NN/DN nodes.
  • 44. Install packages parallelism 4; desc "Install packages for java"; task "install_java", group => “all_vm_node”, sub { install package => “java-1.6.*"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java • Some packages are should be installed globaly(ex: java, wget, etc) • For the hadoop 1.1.x, java 1.6 is recommanded. • use parallelism keyword (if long time is required)
  • 45. Install hadoop(1/3) user "hadoop-user"; private_key "/home/onycom/.ssh/id_rsa"; public_key "/home/onycom/.ssh/id_rsa.pub"; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; desc "prepare_dir"; task "prepare_dir", group=>"hadoop_node", sub { run "mkdir Work"; run "mkdir Download"; run "mkdir src“; run “mkdir tmp”; }; hd1.Rexfile onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir
  • 46. Install hadoop(2/3) desc "hadoop 1.1.2 download with wget"; task "get_hadoop", group=>"hadoop_node", sub { my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop- 1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ...skip.... desc "pig 0.11.1 download with wget"; task "get_pig", group=>"hadoop_node", sub { my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ! hadoop ver. & hadoop eco s/w ver. should be matched This topic is get off the subject on this presentation
  • 47. Install hadoop(3/3) my $HADOOP_SRC_DIR = "/home/hadoop-user/src"; desc "unzip hadoop source files"; task "unzip_src",group=>"hadoop_node", sub { run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR"; }; desc "make link for hadoop source files"; task "link_src", group=>"hadoop_node", sub { run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR; run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR; run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR; };
  • 48. Configuration files(1/3) ● System – /etc/hosts ● Hadoop(../hadoop/conf) – masters & slave – hadoop-env.sh – hdfs-site.xml – core-site.xml – mapred-site.xml
  • 49. Configuration files(2/3) ● Hadoop eco systems & other tools – ex) Ganglia – ex) Flume – agent/collector/master – ex) Oozie or flamingo – Skip these on this PPT. ● User rc file  These are just default & no consider optimization
  • 50. Configuration files(3/3) Provision Server Hadoop NN Hadoop DN 1 Hadoop DN n Hadoop configuration files (../hadoop_conf_repo) SSH/SCP (R)ex ※ Of course, this is just my policy
  • 51. Edit hosts file my $target_file = “/etc/hosts”; my $host_list =‘<<END’ 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor END desc "Add hosts"; task "add_host", group => “all_vm_node", sub { my $exist_cnt = cat $target_file; my $fh = file_write $target_file; $fh->write( $exist_cnt ); $fh->write($host_list); $fh->close; }; ※ You can consider ‘Augeas tool’ to handle system files. Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’
  • 52. Setup .bashrc for user(1/2) ... skip ..... my $hadoop_rc=<<'END'; #Hadoop Configuration export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64" export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext" export HADOOP_USER="/home/hadoop-user" export HADOOP_SRC="$HADOOP_USER/src" export HADOOP_HOME="$HADOOP_USER/hadoop" export PIG_HOME="$HADOOP_SRC/pig" export HIVE_HOME="$HADOOP_SRC/hive" END ... skip .....
  • 53. Setup .bashrc for user(2/2) desc "setup hadoop-user's .rc file"; task "setup_rc_def", group=>"hadoop_node", sub { my $fh = file_append ".bashrc"; $fh->write($base_rc); $fh->write($hadoop_rc); $fh->close(); }; desc "setup hadoop master node .rc file"; task "setup_rc_master", "vmaster", sub { my $fh = file_append ".bashrc"; $fh->write($master_rc); $fh->close(); }; .......... skip ............
  • 54. Configure Hadoop(1/6) ● ‘masters’ [hadoop-user@vmaster: ~]$cd hadoop/conf [hadoop-user@vmaster: conf]$cat masters vmaster ● ‘slaves’ [hadoop-user@vmaster: conf]$cat slaves vnode0 vnode1 vnode2
  • 55. Configure Hadoop(2/6) • hadoop-env.sh ... skip ... The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 #hadoop-user #Remove warring message for "HADOOP_HOME" is deprecated export HADOOP_HOME_WARN_SUPPRESS=TRUE
  • 56. Configure Hadoop(3/6) • hdfs-site.xml ... skip ... <configuration> <!-- modified by hadoop-user --> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop-user/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop-user/hdfs/data</value> </property> </configuration> ※ This ‘replication’ value is depend on our env.
  • 57. Configure Hadoop(4/6) • core-site.xml ... skip ... <configuration> <!--modified by hadoop-user --> <property> <name>fs.default.name</name> <value>hdfs://vmaster:9000</value> </property> </configuration>
  • 58. Configure Hadoop(5/6) • mapred-site.xml .. skip .. <property> <name>mapred.job.tracker</name> <value>vmaster:9001</value> </property> <!-- 2013.9.11. Increse the setting timeout for fail to report status error --> <property> <name>mapred.task.timeout</name> <value>1800000</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property> ※ This ‘timeout’ value is just depend on our env.
  • 59. Configure Hadoop(6/6) my $CNF_REPO="hadoop_conf_repo"; ... skip ... my $MAPRED="mapred-site.xml"; task "upload_mapred", group=>"hadoop_node", sub { file "$HD_CNF/$MAPRED", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$MAPRED"; }; my $CORE_SITE="core-site.xml"; task "upload_core", group=>"hadoop_node", sub { file "$HD_CNF/$CORE_SITE", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$CORE_SITE"; }; ... skip ....
  • 60. Before going any further ● Stop selinux – If it is enforcing ● modify policy of iptables – I recommend to stop it while configure working
  • 61. Lets start hadoop ● login to master node with hadoop-user – ssh –X hadoop-user@vmaster ● hadoop namenode format – hadoop namenode format ● execute start script – ex) start-all.sh
  • 62. Check hadoop status [hadoop-user@vmaster: ~]$jps -l 22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22260 org.apache.hadoop.mapred.JobTracker 21968 org.apache.hadoop.hdfs.server.namenode.NameNode 27896 sun.tools.jps.Jps [hadoop-user@vmaster: ~]$hadoop fs -ls / Found 1 items drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp ※ It seems to be OK. Really?
  • 63. But, life is not easy http://www.trulygraphics.com/tg/weekend/
  • 64. Check status for all DNs task "show_jps", "vnode[0..2]", sub { say run "hostname"; my $r = run "jps"; say $r; }; [onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps vnode0 12682 Jps 12042 TaskTracker 11934 DataNode vnode1 11669 DataNode 11778 TaskTracker 12438 Jps vnode2 11128 DataNode 11237 TaskTracker 11895 Jps
  • 65. If there is some problem, http://blog.lib.umn.edu/isss/undergraduate/2011/11/y ou-do-have-any-tech-problem.html ● Check again – /etc/hosts – selinux & iptables – name & data dir./permission in hdfs – and so on... (on the each node)
  • 66. If you did not meet any problems or fixed those,
  • 67. Now you have hadoop https://hadoopworld2011.eventbrite.com/ Automatic MGM/Prov. solution yonhap &
  • 68. Advnaced Challenge
  • 69. What more can we do?(1/2) ● add/remove data node ● add/remove storage ● Intergrate with monitoring – ex: Ganglia/Nagios ● Intergrate with other hadoop eco – Flume, flamingo, Oozie ● Intergrate other device or server – ex: Switch, DB server
  • 70. What more can we do?(2/2) ● sophisticated hadoop paramer control – ex: use XML parsing ● workflow control & batch ● backup ● periodic file system management – ex: log files ● web GUI ● make a framework for your purpose
  • 71. Ref. • http://hadoop.apache.org/ • http://pig.apache.org/ • http://hive.apache.org/ • http://confluence.openflamingo.org • http://www.openankus.org • http://www.rexify.org • https://groups.google.com/forum/#!forum/re x-users • http://modules.rexify.org/search?q=hadoop
  • 72. http://www.projects2crowdfund.com/what-can-i-do-with- crowdfunding/
  • 73. Thanksjunkim@onycom.com / rainmk6@gmail.com