SlideShare a Scribd company logo
1 of 15
Download to read offline
Apache Hadoop cluster
on Macintosh OSX
The Trigger #DIY
The Kitchen Setup
TheNetwork
Master Chef a.k.a Namenode
Helpers a.k.a Datanode(s)
The Base Ingredients
0.13.0
10.7.5
0.9.5
200
MB/s
2.4.0
1.7.0.55
5.6.17
Basics
• Ensure that all the namenode and datanode machines are running
on the same OSX version
• For the purpose of this POC, I have selected OSX 10.7.5. All sample
commands are specific to this OS. You may need to tweak the
commands to suit your OS version compatibility
• I am a homebrew fan , so I have used the old and gold ruby based
platform for downloading all software needed to run the POC. You
may very well opt for downloading the installers individually and
tweak the process if you wish
• You will need fair bit of understanding of OSX and Hadoop to
understand and interpret. If not, no worries – most of the stuff can
be looked up online by simple Google search
• The “Namenode” machine needs more RAM than “Datanode”
machines. Please configure the namenode machine with at least 8
GB RAM
The Cooking
• Ensure that ALL datanodes and namenode machines are running on the
same OSX version and preferably have regulated software update strategy
(i.e. automatic software disabled)
• Disable automatic “sleep” options in the machines to avoid machines goes
into hibernation (from System Preferences)
• Download and Install “Xcode command line tools for Lion” (skip if Xcode
present)
• As of today, hadoop is not IPv6 friendly. So, please disable IPv6 on all
machines:
 “networksetup –listallnetworkservices” command will display all the network
names that your machine uses to connect to your network (E.g: Ethernet, Wi-
Fi etc.)
 “networksetup –setv6off Ethernet” will disable IPv6 over Ethernet (you may
need to change the network name if it is any different)
The Cooking..
• Give logical names to ALL machines e.g. namenode.local ,datanode01.local
datanode02.local et al. (from System Preferences -> Sharing -> Computer
Name)
• Enable the following services from the Sharing panel of System
Preferences
– File Sharing
– Remote Login
– Remote Management
• Create one universal username (with Administrator privileges) on all
machines . E.g: hadoopuser. Preferably have the same password
• For the rest of steps , please login as this user and execute the commands
The Cooking
• On the namenode, run the command:
vi /etc/hosts
• Add all datanode hostnames , one host per line
• On each of the datanodes, run the command:
vi /etc/hosts
• Add the namenode hostname
sudo visudo
• Add an entry on the last line of the file as under:
hadoopuser ALL=(ALL) NOPASSWD: ALL
Coffee Time
• Install Java JDK and JRE on all the machines from Oracle Site
(http://bit.ly/1s2i7VC) . Configure $JAVA_HOME (see slides for
instructions)
• Set $JAVA_HOME in ALL machines. Usually, it is best to configure the same
in your .profile file. Run the following command to open your .profile
• vi ~/.profile
• #Paste the subsequent lines in the file and save it :
export JAVA_HOME="`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`"
• You may additionally paste the following lines in the same file:
export PATH=$PATH:/usr/local/sbin
PS1="H : d t: w :"
This is helpful for housekeeping activities
The Brewing
• Install “brew” and other components from it
 Run on terminal :
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
[the quotes need to be there]
 Run following command on terminal to ensure that it has been installed properly
brew doctor
 Run following commands in the same order on terminal
brew install makedepend
brew install wget
brew install ssh-copy-id
brew install hadoop
 Run following command on the “namenode” machine
brew install hive
brew install mysql
[assumption is that namenode will host resourcemanager, jobtracker, hive metastore, hiveserver.
brew installs the software in “/usr/local/Cellar” location]
 Run the following command for setting up keyless login from namenode to ALL
datanodes. Run the command on namenode:
ssh-keygen
[press Enter key twice to accept default RSA , and no-passphrase]
 Run the following command recursively for ALL datanode hostnames. Run the command
on namenode:
ssh-copy-id hadoopuser@datanode01.local
provide the password when prompted. The command is verbose and tells if the key is
installed properly. You may validate the same by executing the command :
ssh hadoopuser@datanode01.local . It should NOT ask you to supply password anymore.
After the requisite software has been installed , the next step is to configure the different
components in a stepwise manner. Hadoop works in a distributed mode with “namenode”
being the central hub of the cluster. This gives enough reason to have the common
configuration files created on namenode first, and then copied in an automated manner
into all the datanodes. Let’s start with the .profile changes on namenode machine first.
The Saute
 We are going to configure Hive to use MySQL as the metastore for this POC. All we need
is to create a db user “hiveuser” with a valid password in the MySQL DB installed and
running on namenode AND copy the MySQL driver jar into Hive lib directory
 On the namenode , please fire the command to go to your HADOOP_CONF_DIR
location:
cd /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
Here , we need to create/modify the following set of files:
slaves
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
log4j.properties
 On the namenode, please fire the command to go to your HIVE_CONF_DIR location:
cd /usr/local/Cellar/hive/0.13.0/libexec/conf
Here , we need to create/modify the following set of files:
hive-site.xml
hive-log4j.properties
The Slow cooking
 Please find attached a simple script that, if installed on the namenode, can help you
copy your config files to ALL datanodes (I call it the config-push)
 Please find attached another simple script that I use for rebooting all the datanodes.
The Plating
 You may wish to take the next steps if desired:
 Install zookeeper
 Configure and run journalnodes
 Go for High Availability cluster implementation with multiple Namenodes
 Leave feedback if you wish to know the Hadoop configuration samples
The Garnishing
Disclaimer: Don’t sue me for any damage/infringement, I am not rich 

More Related Content

What's hot

HBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockHBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockMichael Stack
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Spark Programming
Spark ProgrammingSpark Programming
Spark ProgrammingTaewook Eom
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsSamir Bessalah
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with ScalaHimanshu Gupta
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.gutierrezga00
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1Andrii Gakhov
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 

What's hot (20)

HBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockHBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with Clusterdock
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 

Similar to Hadoop on osx

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentationJohn Lynch
 
IT Automation with Ansible
IT Automation with AnsibleIT Automation with Ansible
IT Automation with AnsibleRayed Alrashed
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Lamp Server With Drupal Installation
Lamp Server With Drupal InstallationLamp Server With Drupal Installation
Lamp Server With Drupal Installationfranbow
 
Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Jun Hong Kim
 
WP Sandbox Presentation WordCamp Toronto 2011
WP Sandbox Presentation WordCamp Toronto 2011WP Sandbox Presentation WordCamp Toronto 2011
WP Sandbox Presentation WordCamp Toronto 2011Alfred Ayache
 
Fedora Atomic Workshop handout for Fudcon Pune 2015
Fedora Atomic Workshop handout for Fudcon Pune  2015Fedora Atomic Workshop handout for Fudcon Pune  2015
Fedora Atomic Workshop handout for Fudcon Pune 2015rranjithrajaram
 
Linux basic for CADD biologist
Linux basic for CADD biologistLinux basic for CADD biologist
Linux basic for CADD biologistAjay Murali
 
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
WordPress Development Environments
WordPress Development Environments WordPress Development Environments
WordPress Development Environments Ohad Raz
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy Systemadrian_nye
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica setSudheer Kondla
 

Similar to Hadoop on osx (20)

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Linux
LinuxLinux
Linux
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentation
 
Drupal from scratch
Drupal from scratchDrupal from scratch
Drupal from scratch
 
IT Automation with Ansible
IT Automation with AnsibleIT Automation with Ansible
IT Automation with Ansible
 
Jones_Lamp_Tutorial
Jones_Lamp_TutorialJones_Lamp_Tutorial
Jones_Lamp_Tutorial
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Lamp Server With Drupal Installation
Lamp Server With Drupal InstallationLamp Server With Drupal Installation
Lamp Server With Drupal Installation
 
Lumen
LumenLumen
Lumen
 
Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)
 
WP Sandbox Presentation WordCamp Toronto 2011
WP Sandbox Presentation WordCamp Toronto 2011WP Sandbox Presentation WordCamp Toronto 2011
WP Sandbox Presentation WordCamp Toronto 2011
 
Fedora Atomic Workshop handout for Fudcon Pune 2015
Fedora Atomic Workshop handout for Fudcon Pune  2015Fedora Atomic Workshop handout for Fudcon Pune  2015
Fedora Atomic Workshop handout for Fudcon Pune 2015
 
Linux basic for CADD biologist
Linux basic for CADD biologistLinux basic for CADD biologist
Linux basic for CADD biologist
 
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
WordPress Development Environments
WordPress Development Environments WordPress Development Environments
WordPress Development Environments
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica set
 

Recently uploaded

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Recently uploaded (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

Hadoop on osx

  • 1. Apache Hadoop cluster on Macintosh OSX
  • 3. The Kitchen Setup TheNetwork Master Chef a.k.a Namenode Helpers a.k.a Datanode(s)
  • 5. Basics • Ensure that all the namenode and datanode machines are running on the same OSX version • For the purpose of this POC, I have selected OSX 10.7.5. All sample commands are specific to this OS. You may need to tweak the commands to suit your OS version compatibility • I am a homebrew fan , so I have used the old and gold ruby based platform for downloading all software needed to run the POC. You may very well opt for downloading the installers individually and tweak the process if you wish • You will need fair bit of understanding of OSX and Hadoop to understand and interpret. If not, no worries – most of the stuff can be looked up online by simple Google search • The “Namenode” machine needs more RAM than “Datanode” machines. Please configure the namenode machine with at least 8 GB RAM
  • 6. The Cooking • Ensure that ALL datanodes and namenode machines are running on the same OSX version and preferably have regulated software update strategy (i.e. automatic software disabled) • Disable automatic “sleep” options in the machines to avoid machines goes into hibernation (from System Preferences) • Download and Install “Xcode command line tools for Lion” (skip if Xcode present) • As of today, hadoop is not IPv6 friendly. So, please disable IPv6 on all machines:  “networksetup –listallnetworkservices” command will display all the network names that your machine uses to connect to your network (E.g: Ethernet, Wi- Fi etc.)  “networksetup –setv6off Ethernet” will disable IPv6 over Ethernet (you may need to change the network name if it is any different)
  • 7. The Cooking.. • Give logical names to ALL machines e.g. namenode.local ,datanode01.local datanode02.local et al. (from System Preferences -> Sharing -> Computer Name) • Enable the following services from the Sharing panel of System Preferences – File Sharing – Remote Login – Remote Management • Create one universal username (with Administrator privileges) on all machines . E.g: hadoopuser. Preferably have the same password • For the rest of steps , please login as this user and execute the commands
  • 8. The Cooking • On the namenode, run the command: vi /etc/hosts • Add all datanode hostnames , one host per line • On each of the datanodes, run the command: vi /etc/hosts • Add the namenode hostname sudo visudo • Add an entry on the last line of the file as under: hadoopuser ALL=(ALL) NOPASSWD: ALL
  • 9. Coffee Time • Install Java JDK and JRE on all the machines from Oracle Site (http://bit.ly/1s2i7VC) . Configure $JAVA_HOME (see slides for instructions) • Set $JAVA_HOME in ALL machines. Usually, it is best to configure the same in your .profile file. Run the following command to open your .profile • vi ~/.profile • #Paste the subsequent lines in the file and save it : export JAVA_HOME="`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`" • You may additionally paste the following lines in the same file: export PATH=$PATH:/usr/local/sbin PS1="H : d t: w :" This is helpful for housekeeping activities
  • 10. The Brewing • Install “brew” and other components from it  Run on terminal : ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)" [the quotes need to be there]  Run following command on terminal to ensure that it has been installed properly brew doctor  Run following commands in the same order on terminal brew install makedepend brew install wget brew install ssh-copy-id brew install hadoop  Run following command on the “namenode” machine brew install hive brew install mysql [assumption is that namenode will host resourcemanager, jobtracker, hive metastore, hiveserver. brew installs the software in “/usr/local/Cellar” location]
  • 11.  Run the following command for setting up keyless login from namenode to ALL datanodes. Run the command on namenode: ssh-keygen [press Enter key twice to accept default RSA , and no-passphrase]  Run the following command recursively for ALL datanode hostnames. Run the command on namenode: ssh-copy-id hadoopuser@datanode01.local provide the password when prompted. The command is verbose and tells if the key is installed properly. You may validate the same by executing the command : ssh hadoopuser@datanode01.local . It should NOT ask you to supply password anymore. After the requisite software has been installed , the next step is to configure the different components in a stepwise manner. Hadoop works in a distributed mode with “namenode” being the central hub of the cluster. This gives enough reason to have the common configuration files created on namenode first, and then copied in an automated manner into all the datanodes. Let’s start with the .profile changes on namenode machine first. The Saute
  • 12.  We are going to configure Hive to use MySQL as the metastore for this POC. All we need is to create a db user “hiveuser” with a valid password in the MySQL DB installed and running on namenode AND copy the MySQL driver jar into Hive lib directory  On the namenode , please fire the command to go to your HADOOP_CONF_DIR location: cd /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop Here , we need to create/modify the following set of files: slaves core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml log4j.properties  On the namenode, please fire the command to go to your HIVE_CONF_DIR location: cd /usr/local/Cellar/hive/0.13.0/libexec/conf Here , we need to create/modify the following set of files: hive-site.xml hive-log4j.properties The Slow cooking
  • 13.  Please find attached a simple script that, if installed on the namenode, can help you copy your config files to ALL datanodes (I call it the config-push)  Please find attached another simple script that I use for rebooting all the datanodes. The Plating
  • 14.  You may wish to take the next steps if desired:  Install zookeeper  Configure and run journalnodes  Go for High Availability cluster implementation with multiple Namenodes  Leave feedback if you wish to know the Hadoop configuration samples The Garnishing
  • 15. Disclaimer: Don’t sue me for any damage/infringement, I am not rich 