Your SlideShare is downloading. ×
Hadoop Installation and basic configuration
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hadoop Installation and basic configuration

3,136
views

Published on

Published in: Technology, News & Politics

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,136
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
307
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop HDFS/MapReduce Architecture Hardware Installation and Configuration Monitoring Namenode
  • 2. HDFS Architecture
  • 3. Replication
  • 4. Map Reduce
  • 5. Hardware Requirements ● NameNode + JobTracker – >= 2 cores – >= 8 gigs ram – >= 40gig disk RAID 10 ● DataNode + TaskTracker – >= 4 cores – >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM – >= N Gig disk space JBOD (no raid)
  • 6. Installation ● Download tar file from hadoop or use a prebuilt rpm ● https://github.com/gerritjvv/repo ● http://bigtop.apache.org/
  • 7. Configuration ● $HADOOP_HOME/conf/core-site.xml ● $HADOOP_HOME/conf/mapred-site.xml ● $HADOOP_HOME/conf/hdfs-site.xml ● http://hadoop.apache.org/docs/stable/cluster_setup ●
  • 8. Configuration Namenode ● Create directory for namenode metadata – /data/hadoop/name ● Open core-site.xml – Define fs.default.name = http://<host>:8020 ● Open hdfs-site.xml – Define dfs.name.dir=/data/hadoop/name – Define dfs.replication=3 – Create dir /data/hadoop/hdfs – Define dfs.data.dir=/data/hadoop/hdfs – Defin dfs.http.address=localhost:50070 ● Start the namenode with the format option – /opt/hadoop/bin/hadoop namenode -format – After the format start the namenode with service hadoop-namenode start
  • 9. Configuration JobTracker ● Open /opt/hadoop/conf/mapred-site.xml – Define the property mapred.job.tracker=<host>:8021 – Create the directory /data/hadoop/mapred – Define mapred.local.dir=/data/hadoop/mapred ● Start the JobTracker with service hadoop- jobtracker start
  • 10. Configuration DataNode ● On each datanode create the directory /data/hadoop/hdfs (one directory per disk) ● Open /opt/hadoop/conf/hdfs-site.xml – Define dfs.http.address=<host>:50070 – Define dfs.data.dir=/data/hadoop/hdfs ● Start the datanodes with service hadoop- datanode start
  • 11. Configuration Mapreduce ● On each datanode create the directory /data/hadoop/mapred ● Open /opt/hadoop/conf/mapred-site.xml – Define mapred.local.dir=/data/hadoop/mapred – Define mapred.tasktracker.map.tasks.maximum=<Number of map tasks> – Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce tasks> ● Start the TaskTrackers with service hadoop-tasktracker start
  • 12. Monitoring ● Web Html scraping – https://github.com/gerritjvv/hadoop-monitoring ● Glanglia – http://ganglia.info/?p=88 ● Cacti – http://blog.cloudera.com/blog/2009/07/hadoop-graphing
  • 13. Namenode Edits ● Writes/Updates/Deletes are written to RAM and to a write ahead log. ● The metadata in RAM is only merged into a binary file during the secondary namenode checkpoint ● This file corrupts easily ● Recovery is a manual task
  • 14. HA ● Yarn and Hadoop 2.0.0 ● Experimental ● http://hadoop.apache.org/docs/current/hadoop-yarn
  • 15. End