Hadoop architecture An overview -  Hari Shankar Sreekumar
Ideas Store and process large amounts of data (PetaBytes) Scale horizontally   Failure is normal Distributed computing (MapReduce) Moving computation is cheaper than moving data
What is Hadoop? HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .
What is Hadoop? HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .
Hadoop Distributed File System A  distributed filesystem  designed for storing  very large files  with  streaming data access  running on clusters of  commodity hardware . HDFS has been designed keeping MapReduce in mind Consists of a cluster of machines, each machine performing one or more of the following roles: Namenode (Only one per cluster) Secondary namenode (Checkpoint node) (Only one per cluster) Datanodes (Many per cluster)
HDFS Blocks Blocks in disks: Minimum amount of data that can be read or written. (~ 512 bytes) Filesystem blocks: Abstraction over disk blocks. (~ few kilobytes) HDFS block: Abstraction over Filesystem blocks, to facilitate distribution over network and other requirements of Hadoop. Usually 64 MB or 128 MB. Block abstraction keeps the design simple. e.g, replication is at block level rather than file level. File is split into blocks for storing in HDFS. Blocks of the same file can reside on multiple machines in the cluster. Each block is stored as a file in the Local FS of the DataNode. Block size does not refer to size on disk. 1 MB file will not take up 64 MB on disk.
Namenode and Datanodes The "master" node Maintains the HDFS namespace, filesystem tree and metadata. Maintains the mapping from each file to the list of blockIDs where the file is. Metadata mapping is maintained in memory as well as persisted on disk. Maintains in memory the locations of each block. (Block to datanode mapping) Memory requirement: ~150 bytes/file Issues instructions to datanode to create/replicate/delete blocks Single point of failure
Datanodes The "slaves" Serve as storage for data blocks No metadata Report all blocks to namenode at startup (BlockReport) Sends periodic "heartbeat" to Namenode Serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode. User data never flows through the NameNode.
Secondary namenode/Checkpoint node To reduce data-loss risk if Namenode fails. Persistent data is stored in two files in Namenode - The FsImage and the Edit log. Changes in file metadata go into the Edit log. Secondary namenode periodically merges Edit log with FsImage. Data loss will still happen if Namenode fails. Configure Hadoop to write Editlog into a remote NFS mount as well. In case of failure, copy metadata files from NFS to Secondary Namenode and run it. NFS idea has a (very low) performance impact Failover is NOT automatic
Image: Hadoop, The definitive Guide (Tom White)
Replication and rack-awareness Replication in Hadoop is at the block level. Replication is "Rack-aware" Three levels for replication preference:                        Same machine > Same rack > Different rack Replication can be configured per file. Can also be configured from application Selection of blocks to process in a MapReduce job takes advantage of rack-awareness. Reading and writing on HDFS also makes use of rack-awareness. Rack-awareness is NOT automatic, and needs to be configured. By default, all nodes are assumed to be in the same rack.
Reading from HDFS Image: Hadoop, The definitive Guide (Tom White) Failure=>Move to next 'closest' node with the block. Direct connection between client and datanode
Writing to HDFS Minimum replication for successful write: dfs.replication.min Files in HDFS are write-once and have strictly one writer at any time. Image: Hadoop, The definitive Guide (Tom White)
Hadoop Common File system abstraction: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others. Service-level authorization: Service Level Authorization is the initial authorization mechanism to ensure clients connecting to a particular Hadoop  service  have the necessary, pre-configured, permissions and are authorized to access the given service. For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs.
A separate 32-bit checksum is created for every io.bytes.per.checksum bytes (Default is 512 bytes. Overhead < 1 %)   Checksums are stored with each data block. Verified after each operation that might result in data corruption. Also checked periodically. Can be used in non-HDFS filesystems also. Data Integrity
Compression utilities Reduces space usage Reduces bandwidth usage Ref: Hadoop, The definitive Guide (Tom White) Splittable LZO is available separately and is a good trade-off between compression speed and compressed size.
Serialization utilities Extremely important for Hadoop. A good serialization format is Compact, Fast, Extensible and Interoperable. Java Serialization is very cumbersome and heavy for Hadoop. So it uses its own serialization, based on the  Writable  interface. Other frameworks such as Avro, Thrift and protocol buffers are also used.
MapReduce Framework Jobtracker receives map-reduce job execution request from Client. Does sanity checks to see if the job is configured properly. Computes the input splits. Loads resources required for the job into HDFS Assigns splits to tasktrackers for map and reduce phases Map split assignment is data-locality-aware Single point of failure   Tasktracker creates a new process for the task and executes it.  Sends periodic heartbeats to the Jobtracker, along with other information about the task.
Image: Hadoop, The definitive Guide (Tom White)
References http://hadoop.apache.org/common/docs/current/hdfs_design.html Hadoop: The Definitive Guide, by Tom White. Copyright 2009 Tom White, 978-0-596-52197-4

Hadoop Architecture

  • 1.
    Hadoop architecture Anoverview - Hari Shankar Sreekumar
  • 2.
    Ideas Store andprocess large amounts of data (PetaBytes) Scale horizontally   Failure is normal Distributed computing (MapReduce) Moving computation is cheaper than moving data
  • 3.
    What is Hadoop?HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .
  • 4.
    What is Hadoop?HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .
  • 5.
    Hadoop Distributed FileSystem A distributed filesystem designed for storing very large files with streaming data access running on clusters of commodity hardware . HDFS has been designed keeping MapReduce in mind Consists of a cluster of machines, each machine performing one or more of the following roles: Namenode (Only one per cluster) Secondary namenode (Checkpoint node) (Only one per cluster) Datanodes (Many per cluster)
  • 6.
    HDFS Blocks Blocksin disks: Minimum amount of data that can be read or written. (~ 512 bytes) Filesystem blocks: Abstraction over disk blocks. (~ few kilobytes) HDFS block: Abstraction over Filesystem blocks, to facilitate distribution over network and other requirements of Hadoop. Usually 64 MB or 128 MB. Block abstraction keeps the design simple. e.g, replication is at block level rather than file level. File is split into blocks for storing in HDFS. Blocks of the same file can reside on multiple machines in the cluster. Each block is stored as a file in the Local FS of the DataNode. Block size does not refer to size on disk. 1 MB file will not take up 64 MB on disk.
  • 7.
    Namenode and DatanodesThe &quot;master&quot; node Maintains the HDFS namespace, filesystem tree and metadata. Maintains the mapping from each file to the list of blockIDs where the file is. Metadata mapping is maintained in memory as well as persisted on disk. Maintains in memory the locations of each block. (Block to datanode mapping) Memory requirement: ~150 bytes/file Issues instructions to datanode to create/replicate/delete blocks Single point of failure
  • 8.
    Datanodes The &quot;slaves&quot;Serve as storage for data blocks No metadata Report all blocks to namenode at startup (BlockReport) Sends periodic &quot;heartbeat&quot; to Namenode Serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode. User data never flows through the NameNode.
  • 9.
    Secondary namenode/Checkpoint nodeTo reduce data-loss risk if Namenode fails. Persistent data is stored in two files in Namenode - The FsImage and the Edit log. Changes in file metadata go into the Edit log. Secondary namenode periodically merges Edit log with FsImage. Data loss will still happen if Namenode fails. Configure Hadoop to write Editlog into a remote NFS mount as well. In case of failure, copy metadata files from NFS to Secondary Namenode and run it. NFS idea has a (very low) performance impact Failover is NOT automatic
  • 10.
    Image: Hadoop, Thedefinitive Guide (Tom White)
  • 11.
    Replication and rack-awarenessReplication in Hadoop is at the block level. Replication is &quot;Rack-aware&quot; Three levels for replication preference:                        Same machine > Same rack > Different rack Replication can be configured per file. Can also be configured from application Selection of blocks to process in a MapReduce job takes advantage of rack-awareness. Reading and writing on HDFS also makes use of rack-awareness. Rack-awareness is NOT automatic, and needs to be configured. By default, all nodes are assumed to be in the same rack.
  • 12.
    Reading from HDFSImage: Hadoop, The definitive Guide (Tom White) Failure=>Move to next 'closest' node with the block. Direct connection between client and datanode
  • 13.
    Writing to HDFSMinimum replication for successful write: dfs.replication.min Files in HDFS are write-once and have strictly one writer at any time. Image: Hadoop, The definitive Guide (Tom White)
  • 14.
    Hadoop Common Filesystem abstraction: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others. Service-level authorization: Service Level Authorization is the initial authorization mechanism to ensure clients connecting to a particular Hadoop  service  have the necessary, pre-configured, permissions and are authorized to access the given service. For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs.
  • 15.
    A separate 32-bitchecksum is created for every io.bytes.per.checksum bytes (Default is 512 bytes. Overhead < 1 %)   Checksums are stored with each data block. Verified after each operation that might result in data corruption. Also checked periodically. Can be used in non-HDFS filesystems also. Data Integrity
  • 16.
    Compression utilities Reducesspace usage Reduces bandwidth usage Ref: Hadoop, The definitive Guide (Tom White) Splittable LZO is available separately and is a good trade-off between compression speed and compressed size.
  • 17.
    Serialization utilities Extremelyimportant for Hadoop. A good serialization format is Compact, Fast, Extensible and Interoperable. Java Serialization is very cumbersome and heavy for Hadoop. So it uses its own serialization, based on the Writable interface. Other frameworks such as Avro, Thrift and protocol buffers are also used.
  • 18.
    MapReduce Framework Jobtrackerreceives map-reduce job execution request from Client. Does sanity checks to see if the job is configured properly. Computes the input splits. Loads resources required for the job into HDFS Assigns splits to tasktrackers for map and reduce phases Map split assignment is data-locality-aware Single point of failure   Tasktracker creates a new process for the task and executes it.  Sends periodic heartbeats to the Jobtracker, along with other information about the task.
  • 19.
    Image: Hadoop, Thedefinitive Guide (Tom White)
  • 20.
    References http://hadoop.apache.org/common/docs/current/hdfs_design.html Hadoop:The Definitive Guide, by Tom White. Copyright 2009 Tom White, 978-0-596-52197-4