Your SlideShare is downloading. ×
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop Installation and basic configuration

3,353

Published on

Published in: Technology, News & Politics
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,353
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
330
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop HDFS/MapReduce Architecture Hardware Installation and Configuration Monitoring Namenode
  • 2. HDFS Architecture
  • 3. Replication
  • 4. Map Reduce
  • 5. Hardware Requirements ● NameNode + JobTracker – >= 2 cores – >= 8 gigs ram – >= 40gig disk RAID 10 ● DataNode + TaskTracker – >= 4 cores – >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM – >= N Gig disk space JBOD (no raid)
  • 6. Installation ● Download tar file from hadoop or use a prebuilt rpm ● https://github.com/gerritjvv/repo ● http://bigtop.apache.org/
  • 7. Configuration ● $HADOOP_HOME/conf/core-site.xml ● $HADOOP_HOME/conf/mapred-site.xml ● $HADOOP_HOME/conf/hdfs-site.xml ● http://hadoop.apache.org/docs/stable/cluster_setup ●
  • 8. Configuration Namenode ● Create directory for namenode metadata – /data/hadoop/name ● Open core-site.xml – Define fs.default.name = http://<host>:8020 ● Open hdfs-site.xml – Define dfs.name.dir=/data/hadoop/name – Define dfs.replication=3 – Create dir /data/hadoop/hdfs – Define dfs.data.dir=/data/hadoop/hdfs – Defin dfs.http.address=localhost:50070 ● Start the namenode with the format option – /opt/hadoop/bin/hadoop namenode -format – After the format start the namenode with service hadoop-namenode start
  • 9. Configuration JobTracker ● Open /opt/hadoop/conf/mapred-site.xml – Define the property mapred.job.tracker=<host>:8021 – Create the directory /data/hadoop/mapred – Define mapred.local.dir=/data/hadoop/mapred ● Start the JobTracker with service hadoop- jobtracker start
  • 10. Configuration DataNode ● On each datanode create the directory /data/hadoop/hdfs (one directory per disk) ● Open /opt/hadoop/conf/hdfs-site.xml – Define dfs.http.address=<host>:50070 – Define dfs.data.dir=/data/hadoop/hdfs ● Start the datanodes with service hadoop- datanode start
  • 11. Configuration Mapreduce ● On each datanode create the directory /data/hadoop/mapred ● Open /opt/hadoop/conf/mapred-site.xml – Define mapred.local.dir=/data/hadoop/mapred – Define mapred.tasktracker.map.tasks.maximum=<Number of map tasks> – Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce tasks> ● Start the TaskTrackers with service hadoop-tasktracker start
  • 12. Monitoring ● Web Html scraping – https://github.com/gerritjvv/hadoop-monitoring ● Glanglia – http://ganglia.info/?p=88 ● Cacti – http://blog.cloudera.com/blog/2009/07/hadoop-graphing
  • 13. Namenode Edits ● Writes/Updates/Deletes are written to RAM and to a write ahead log. ● The metadata in RAM is only merged into a binary file during the secondary namenode checkpoint ● This file corrupts easily ● Recovery is a manual task
  • 14. HA ● Yarn and Hadoop 2.0.0 ● Experimental ● http://hadoop.apache.org/docs/current/hadoop-yarn
  • 15. End

×