Hadoop HDFS/MapReduce
Architecture
Hardware
Installation and Configuration
Monitoring
Namenode
HDFS Architecture
Replication
Map Reduce
Hardware Requirements
● NameNode + JobTracker
– >= 2 cores
– >= 8 gigs ram
– >= 40gig disk RAID 10
● DataNode + TaskTracke...
Installation
● Download tar file from hadoop or use a prebuilt
rpm
● https://github.com/gerritjvv/repo
● http://bigtop.apa...
Configuration
● $HADOOP_HOME/conf/core-site.xml
● $HADOOP_HOME/conf/mapred-site.xml
● $HADOOP_HOME/conf/hdfs-site.xml
● ht...
Configuration Namenode
● Create directory for namenode metadata
– /data/hadoop/name
● Open core-site.xml
– Define fs.defau...
Configuration JobTracker
● Open /opt/hadoop/conf/mapred-site.xml
– Define the property
mapred.job.tracker=<host>:8021
– Cr...
Configuration DataNode
● On each datanode create the directory
/data/hadoop/hdfs (one directory per disk)
● Open /opt/hado...
Configuration Mapreduce
● On each datanode create the directory /data/hadoop/mapred
● Open /opt/hadoop/conf/mapred-site.xm...
Monitoring
● Web Html scraping
– https://github.com/gerritjvv/hadoop-monitoring
● Glanglia
– http://ganglia.info/?p=88
● C...
Namenode Edits
● Writes/Updates/Deletes are written to RAM and
to a write ahead log.
● The metadata in RAM is only merged ...
HA
● Yarn and Hadoop 2.0.0
● Experimental
● http://hadoop.apache.org/docs/current/hadoop-yarn
End
Upcoming SlideShare
Loading in...5
×

Hadoop Installation and basic configuration

3,486

Published on

Published in: Technology, News & Politics
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,486
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
342
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Hadoop Installation and basic configuration

  1. 1. Hadoop HDFS/MapReduce Architecture Hardware Installation and Configuration Monitoring Namenode
  2. 2. HDFS Architecture
  3. 3. Replication
  4. 4. Map Reduce
  5. 5. Hardware Requirements ● NameNode + JobTracker – >= 2 cores – >= 8 gigs ram – >= 40gig disk RAID 10 ● DataNode + TaskTracker – >= 4 cores – >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM – >= N Gig disk space JBOD (no raid)
  6. 6. Installation ● Download tar file from hadoop or use a prebuilt rpm ● https://github.com/gerritjvv/repo ● http://bigtop.apache.org/
  7. 7. Configuration ● $HADOOP_HOME/conf/core-site.xml ● $HADOOP_HOME/conf/mapred-site.xml ● $HADOOP_HOME/conf/hdfs-site.xml ● http://hadoop.apache.org/docs/stable/cluster_setup ●
  8. 8. Configuration Namenode ● Create directory for namenode metadata – /data/hadoop/name ● Open core-site.xml – Define fs.default.name = http://<host>:8020 ● Open hdfs-site.xml – Define dfs.name.dir=/data/hadoop/name – Define dfs.replication=3 – Create dir /data/hadoop/hdfs – Define dfs.data.dir=/data/hadoop/hdfs – Defin dfs.http.address=localhost:50070 ● Start the namenode with the format option – /opt/hadoop/bin/hadoop namenode -format – After the format start the namenode with service hadoop-namenode start
  9. 9. Configuration JobTracker ● Open /opt/hadoop/conf/mapred-site.xml – Define the property mapred.job.tracker=<host>:8021 – Create the directory /data/hadoop/mapred – Define mapred.local.dir=/data/hadoop/mapred ● Start the JobTracker with service hadoop- jobtracker start
  10. 10. Configuration DataNode ● On each datanode create the directory /data/hadoop/hdfs (one directory per disk) ● Open /opt/hadoop/conf/hdfs-site.xml – Define dfs.http.address=<host>:50070 – Define dfs.data.dir=/data/hadoop/hdfs ● Start the datanodes with service hadoop- datanode start
  11. 11. Configuration Mapreduce ● On each datanode create the directory /data/hadoop/mapred ● Open /opt/hadoop/conf/mapred-site.xml – Define mapred.local.dir=/data/hadoop/mapred – Define mapred.tasktracker.map.tasks.maximum=<Number of map tasks> – Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce tasks> ● Start the TaskTrackers with service hadoop-tasktracker start
  12. 12. Monitoring ● Web Html scraping – https://github.com/gerritjvv/hadoop-monitoring ● Glanglia – http://ganglia.info/?p=88 ● Cacti – http://blog.cloudera.com/blog/2009/07/hadoop-graphing
  13. 13. Namenode Edits ● Writes/Updates/Deletes are written to RAM and to a write ahead log. ● The metadata in RAM is only merged into a binary file during the secondary namenode checkpoint ● This file corrupts easily ● Recovery is a manual task
  14. 14. HA ● Yarn and Hadoop 2.0.0 ● Experimental ● http://hadoop.apache.org/docs/current/hadoop-yarn
  15. 15. End
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×