BUMPER
Topic 1
HDFS – Hands On (Part – 1)
Class 2 – Hadoop Distributed File System
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Pre-requisites
HDFS – Hands On
Virtual Machine is up and running.
Connected to your Virtual Machine using putty as ‘hduser’.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
<command>: Indicates what is the purpose of the
statement and always preceded by a ‘-‘.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
<command>: Indicates what is the purpose of the statement and
always preceded by a ‘-‘.
<args>: Indicates the arguments that are applicable for the
command.
Where do DataNodes store data?
HDFS – Hands On
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
= /tmp/hadoop/dfs/data
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
= /tmp/hadoop/dfs/data
VERSION >> Java properties file
blk_********* >> Raw data of a file
blk_******.meta >> Metadata of the block
How come there is a block when we have not loaded any file?
jobtracker.info
HDFS – Hands On
fsck
HDFS – Hands On
Generates a summary report that lists the overall health of the filesystem.
fsck
HDFS – Hands On
Total size: Indicates the size of the directory (root directory in our case).
Does not account for replication.
Total dirs: Indicates the number of directories in HDFS
Total files: Indicates the number of files in HDFS
Total blocks: Indicates the number of blocks
Default replication factor:
Average replication factor:
Corrupt blocks:
Missing replicas:
Number of data nodes:
Number of racks:
Edit .bashrc
HDFS – Hands On
Navigate to the home directory.
cd
List hidden files.
ls -a
Edit the .bashrc file.
vi .bashrc
Update HADOOP paths using ‘export’ command.
export HADOOP_CONF=/home/hduser/hadoop/conf
export HADOOP_PREFIX=/home/hduser/hadoop
# Add Hadoop bin/ directory to path
export PATH=$PATH:$HADOOP_PREFIX/bin
Execute the updated contents of the .bashrc file.
source ~/.bashrc
copyFromLocal
HDFS – Hands On
Copies file from local file system to HDFS.
hadoop fs –copyFromLocal <Path to source file on Local File System> <Target
path in HDFS>
hadoop fs –copyFromLocal NOTICE.txt noticehdfs.txt
copyFromLocal
HDFS – Hands On
copyFromLocal commands internally results in:
 a file getting split into multiple blocks.
 the client contacting the NameNode to find out where each block
should be copied in the cluster.
 replication of blocks to nodes assigned by NameNode.
How many blocks were created?
HDFS – Hands On
RECAP
HDFS Commonly used commands
HDFS Concepts
BUMPER
BUMPER
Topic 2
HDFS – Hands On (Part – 2)
Class 2 – Hadoop Distributed File System
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Load a file larger than the block size
HDFS – Hands On
Load a 200 MB file and see how many blocks were created.
Command to generate a 200 MB dummy file.
dd if=/dev/zero of=file.txt count=1024 bs=204800
hadoop fs –copyFromLocal file.txt file.txt
cd /tmp/hadoop/dfs/data/current
ls –lrt
Load a file larger than the block size
HDFS – Hands On
Block 1 = 64 MB
Block 2 = 64 MB
Block 3 = 8 MB
Block 4 = 64 MB
fsck
HDFS – Hands On
fsck after loading 2 additional files.
Total size has increased.
Total dirs: 7. Additions - /user and /user/hduser directories.
Total files: 3. Additions - 2 newly loaded files.
Total blocks: 6. Additions - 1 block of the 1st file and 4 blocks of the 2nd file.
cat
HDFS – Hands On
Displays contents of file on the command prompt.
hadoop fs –cat <Path of file in HDFS>
hadoop fs –cat noticehdfs.txt
copyToLocal
HDFS – Hands On
Copies file from HDFS to local file system.
hadoop fs –copyToLocal <Path of file in HDFS> <Path of file in Local File System>
hadoop fs –copyToLocal noticehdfs.txt noticelocal.txt
mkdir
HDFS – Hands On
Creates a directory inside HDFS.
HDFS paths are relative.
Creates directory in current user’s home directory
hadoop fs –mkdir newdir
Creates new directory under root
hadoop fs –mkdir /newdir
rm
HDFS – Hands On
Removes file (s).
hadoop fs –rm <File Name>
Removes file and empty directories.
hadoop fs –rm noticehdfs.txt
Trash feature
HDFS – Hands On
Prevents accidental deletion of files and directories.
Disabled by default.
To enable, configure the fs.trash.interval property in core-site.xml file.
RECAP
HDFS Commonly used commands
HDFS Concepts
BUMPER
BUMPER
Topic 3
HDFS – Web UI
Class 2 – Hadoop Distributed File System
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
NameNode Web Interface
HDFS – Hands On
HDFS Web Interface URL.
http://<namenode_host>:50070/
From the Virtual Machine:
http://localhost:50070/
From outside the Virtual Machine:
http://<IP Address of VM or Hostname of VM>:50070/
Example- http://192.168.234.135:50070/
NameNode Web Interface
HDFS – Hands On
Server Name and Port
Last start time of the NameNode
Hadoop Version, followed by subversion source code repository
To browse the files in HDFS
View NameNode log files
Number of files, directories and blocks. Heap memory utilized/available.
Storage capacity of machines in the cluster
How much space utilized in HDFS
Space utilized by O/S, Applications etc.
Amount of space available on HDFS
How many blocks have replicas less than Replication Factor
Nodes that are active and in contact with NameNode
Nodes that are NOT in contact with NameNode
Nodes administratively removed from the cluster
RECAP
HDFS Web UI
BUMPER
BUMPER
Topic 4
Class 2 – Hadoop Distributed File System
MapReduce – Hands On (Part – 1)
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
How does MapReduce work?
MapReduce
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Replication, Rack
Awareness etc.
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Replication, Rack
Awareness etc.
MapReduce Execution Framework
MapReduce
MapReduce Execution Framework
MapReduce
Mapper Process
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper ProcessDriver
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Driver
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
InputFormat
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1
InputFormat
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads Reads
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads Reads
Calculates
Defines
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Shuffle
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
Calculates
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
RECAP
MapReduce Execution Framework
BUMPER
BUMPER
Topic 5
Class 2 – Hadoop Distributed File System
MapReduce – Hands On (Part – 2)
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Java MapReduce Programming
MapReduce
Hello World of MapReduce >> Word Count program
Eclipse – Integrated Development Environment (IDE)
https://www.eclipse.org/downloads/
RECAP
Part two of Java MapReduce program
BUMPER

Bd class 2 complete

  • 1.
  • 3.
    Topic 1 HDFS –Hands On (Part – 1) Class 2 – Hadoop Distributed File System
  • 4.
    AGENDA • What isBig Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 5.
    Pre-requisites HDFS – HandsOn Virtual Machine is up and running. Connected to your Virtual Machine using putty as ‘hduser’.
  • 6.
    Command Syntax HDFS –Hands On hadoop fs –ls / (To list directory contents)
  • 7.
    Command Syntax HDFS –Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args>
  • 8.
    Command Syntax HDFS –Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable.
  • 9.
    Command Syntax HDFS –Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable. fs: Invokes the Hadoop file system, which is the HDFS.
  • 10.
    Command Syntax HDFS –Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable. fs: Invokes the Hadoop file system, which is the HDFS. <command>: Indicates what is the purpose of the statement and always preceded by a ‘-‘.
  • 11.
    Command Syntax HDFS –Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable. fs: Invokes the Hadoop file system, which is the HDFS. <command>: Indicates what is the purpose of the statement and always preceded by a ‘-‘. <args>: Indicates the arguments that are applicable for the command.
  • 12.
    Where do DataNodesstore data? HDFS – Hands On
  • 13.
    Where do DataNodesstore data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop
  • 14.
    Where do DataNodesstore data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
  • 15.
    Where do DataNodesstore data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data = /tmp/hadoop/dfs/data
  • 16.
    Where do DataNodesstore data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data = /tmp/hadoop/dfs/data VERSION >> Java properties file blk_********* >> Raw data of a file blk_******.meta >> Metadata of the block How come there is a block when we have not loaded any file?
  • 17.
  • 18.
    fsck HDFS – HandsOn Generates a summary report that lists the overall health of the filesystem.
  • 19.
    fsck HDFS – HandsOn Total size: Indicates the size of the directory (root directory in our case). Does not account for replication. Total dirs: Indicates the number of directories in HDFS Total files: Indicates the number of files in HDFS Total blocks: Indicates the number of blocks Default replication factor: Average replication factor: Corrupt blocks: Missing replicas: Number of data nodes: Number of racks:
  • 20.
    Edit .bashrc HDFS –Hands On Navigate to the home directory. cd List hidden files. ls -a Edit the .bashrc file. vi .bashrc Update HADOOP paths using ‘export’ command. export HADOOP_CONF=/home/hduser/hadoop/conf export HADOOP_PREFIX=/home/hduser/hadoop # Add Hadoop bin/ directory to path export PATH=$PATH:$HADOOP_PREFIX/bin Execute the updated contents of the .bashrc file. source ~/.bashrc
  • 21.
    copyFromLocal HDFS – HandsOn Copies file from local file system to HDFS. hadoop fs –copyFromLocal <Path to source file on Local File System> <Target path in HDFS> hadoop fs –copyFromLocal NOTICE.txt noticehdfs.txt
  • 22.
    copyFromLocal HDFS – HandsOn copyFromLocal commands internally results in:  a file getting split into multiple blocks.  the client contacting the NameNode to find out where each block should be copied in the cluster.  replication of blocks to nodes assigned by NameNode.
  • 23.
    How many blockswere created? HDFS – Hands On
  • 24.
    RECAP HDFS Commonly usedcommands HDFS Concepts
  • 25.
  • 26.
  • 28.
    Topic 2 HDFS –Hands On (Part – 2) Class 2 – Hadoop Distributed File System
  • 29.
    AGENDA • What isBig Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 30.
    Load a filelarger than the block size HDFS – Hands On Load a 200 MB file and see how many blocks were created. Command to generate a 200 MB dummy file. dd if=/dev/zero of=file.txt count=1024 bs=204800 hadoop fs –copyFromLocal file.txt file.txt cd /tmp/hadoop/dfs/data/current ls –lrt
  • 31.
    Load a filelarger than the block size HDFS – Hands On Block 1 = 64 MB Block 2 = 64 MB Block 3 = 8 MB Block 4 = 64 MB
  • 32.
    fsck HDFS – HandsOn fsck after loading 2 additional files. Total size has increased. Total dirs: 7. Additions - /user and /user/hduser directories. Total files: 3. Additions - 2 newly loaded files. Total blocks: 6. Additions - 1 block of the 1st file and 4 blocks of the 2nd file.
  • 33.
    cat HDFS – HandsOn Displays contents of file on the command prompt. hadoop fs –cat <Path of file in HDFS> hadoop fs –cat noticehdfs.txt
  • 34.
    copyToLocal HDFS – HandsOn Copies file from HDFS to local file system. hadoop fs –copyToLocal <Path of file in HDFS> <Path of file in Local File System> hadoop fs –copyToLocal noticehdfs.txt noticelocal.txt
  • 35.
    mkdir HDFS – HandsOn Creates a directory inside HDFS. HDFS paths are relative. Creates directory in current user’s home directory hadoop fs –mkdir newdir Creates new directory under root hadoop fs –mkdir /newdir
  • 36.
    rm HDFS – HandsOn Removes file (s). hadoop fs –rm <File Name> Removes file and empty directories. hadoop fs –rm noticehdfs.txt
  • 37.
    Trash feature HDFS –Hands On Prevents accidental deletion of files and directories. Disabled by default. To enable, configure the fs.trash.interval property in core-site.xml file.
  • 38.
    RECAP HDFS Commonly usedcommands HDFS Concepts
  • 39.
  • 40.
  • 42.
    Topic 3 HDFS –Web UI Class 2 – Hadoop Distributed File System
  • 43.
    AGENDA • What isBig Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 44.
    NameNode Web Interface HDFS– Hands On HDFS Web Interface URL. http://<namenode_host>:50070/ From the Virtual Machine: http://localhost:50070/ From outside the Virtual Machine: http://<IP Address of VM or Hostname of VM>:50070/ Example- http://192.168.234.135:50070/
  • 45.
    NameNode Web Interface HDFS– Hands On Server Name and Port Last start time of the NameNode Hadoop Version, followed by subversion source code repository To browse the files in HDFS View NameNode log files Number of files, directories and blocks. Heap memory utilized/available. Storage capacity of machines in the cluster How much space utilized in HDFS Space utilized by O/S, Applications etc. Amount of space available on HDFS How many blocks have replicas less than Replication Factor Nodes that are active and in contact with NameNode Nodes that are NOT in contact with NameNode Nodes administratively removed from the cluster
  • 46.
  • 47.
  • 48.
  • 50.
    Topic 4 Class 2– Hadoop Distributed File System MapReduce – Hands On (Part – 1)
  • 51.
    AGENDA • What isBig Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 52.
    How does MapReducework? MapReduce
  • 53.
    How does MapReducework? MapReduce Map Input List Map Output List Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 54.
    How does MapReducework? MapReduce Map Input List Map Output List Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 55.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 56.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 57.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 58.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 59.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Reducer Mapping Phase Reducing Phase
  • 60.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Reducer Mapping Phase Reducing Phase
  • 61.
    How does MapReducework? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Reducer Mapping Phase Reducing Phase
  • 62.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 63.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 64.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 65.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 66.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 67.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 68.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 69.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 70.
    Hadoop MapReduce MapReduce <1, King QueenKing> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 71.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 72.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS
  • 73.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format
  • 74.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format
  • 75.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’
  • 76.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records
  • 77.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records
  • 78.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic
  • 79.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic
  • 80.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic
  • 81.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic
  • 82.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic
  • 83.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format
  • 84.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format
  • 85.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format Replication, Rack Awareness etc.
  • 86.
    Hadoop MapReduce –Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format Replication, Rack Awareness etc.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Driver
  • 92.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver
  • 93.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver InputFormat
  • 94.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 InputFormat
  • 95.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat
  • 96.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Calculates
  • 97.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Calculates
  • 98.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Calculates
  • 99.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Calculates
  • 100.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Reads Reads Calculates
  • 101.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Reads Reads Calculates Defines
  • 102.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs
  • 103.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs
  • 104.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 105.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 106.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 107.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 108.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Shuffle
  • 109.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle
  • 110.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 111.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 112.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Reducer Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 113.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Reducer Record Reader Reads Passes <K,V> pairs Reads Passes <K,V> pairs Calculates Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 114.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Reducer Record Reader Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 115.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Output Data Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 116.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 117.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 118.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 119.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 120.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Defines Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 121.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Defines Calculates Defines Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 122.
    MapReduce Execution Framework MapReduce ReduceProcess Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Defines Calculates Defines Defines Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 123.
  • 124.
  • 125.
  • 127.
    Topic 5 Class 2– Hadoop Distributed File System MapReduce – Hands On (Part – 2)
  • 128.
    AGENDA • What isBig Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 129.
    Java MapReduce Programming MapReduce HelloWorld of MapReduce >> Word Count program Eclipse – Integrated Development Environment (IDE) https://www.eclipse.org/downloads/
  • 130.
    RECAP Part two ofJava MapReduce program
  • 131.

Editor's Notes

  • #3 Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  • #4 Do we know the topic number for this?
  • #28 Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  • #29 Do we know the topic number for this?
  • #42 Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  • #43 Do we know the topic number for this?
  • #50 Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  • #51 Do we know the topic number for this?
  • #127 Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  • #128 Do we know the topic number for this?