HADOOP
Interacting with HDFS
1
For University Program on Apache Hadoop & Apache Apex
→ What's the “Need” ? ←
❏ Big data Ocean
❏ Expensive hardware
❏ Frequent Failures and Difficult recovery
❏ Scaling up with more machines
2
→ Hadoop ←
❏ Open source software
- a Java framework
- initial release: December 10, 2011
❏ It provides both,
❏ Storage → [HDFS]
❏ Processing → [MapReduce]
❏ HDFS: Hadoop Distributed File System
3
→ How Hadoop addresses the need? ←
❏ Big data Ocean
■ Have multiple machines. Each will store some portion of data, not the entire data.
❏ Expensive hardware
■ Use commodity hardware. Simple and cheap.
❏ Frequent Failures and Difficult recovery
■ Have multiple copies of data. Have the copies in different machines.
❏ Scaling up with more machines
■ If more processing is needed, add new machines on the fly
4
→ HDFS ←
❏ Runs on Commodity hardware: Doesn't require expensive machines
❏ Large Files; Write-once, Read-many (WORM)
❏ Files are split into blocks
❏ Actual blocks go to DataNodes
❏ The metadata is stored at NameNode
❏ Replicate blocks to different node
❏ Default configuration:
■ Block size = 128MB
■ Replication Factor = 3
5
6
7
8
→ Where NOT TO use HDFS ←
❏ Low latency data access
■ HDFS is optimized for high throughput of data at the expense of latency.
❏ Large number of small files
■ Namenode has the entire file-system metadata in memory.
■ Too much metadata as compared to actual data.
❏ Multiple writers / Arbitrary file modifications
■ No support for multiple writers for a file
■ Always append to end of a file
9
→ Some Key Concepts ←
❏ NameNode
❏ DataNodes
❏ JobTracker
❏ TaskTrackers
❏ ResourceManager (MRv2)
❏ NodeManager (MRv2)
❏ ApplicationMaster (MRv2)
10
→ NameNode & DataNodes ←
❏ NameNode:
■ Centerpiece of HDFS: The Master
■ Only stores the block metadata: block-name, block-location etc.
■ Critical component; When down, whole cluster is considered down; Single point of failure
■ Should be configured with higher RAM
❏ DataNode:
■ Stores the actual data: The Slave
■ In constant communication with NameNode
■ When down, it does not affect the availability of data/cluster
■ Should be configured with higher disk space
❏ SecondaryNameNode:
■ Doesn't actually act as a NameNode
■ Stores the image of primary NameNode at certain checkpoint
■ Used as backup to restore NameNode
11
12
→ JobTracker & TaskTrackers ←
❏ JobTracker:
■ Talks to the NameNode to determine location of the data
■ Monitors all TaskTrackers and submits status of the job back to the client
■ When down, HDFS is still functional; no new MR job; existing jobs halted
■ Replaced by ResourceManager/ApplicationMaster in MRv2
❏ TaskTracker:
■ Runs on all DataNodes
■ TaskTracker communicates with JobTracker signaling the task progress
■ TaskTracker failure is not considered fatal
■ Replaced by NodeManager in MRv2
13
→ ResourceManager & NodeManager ←
❏ Present in Hadoop v2.0
❏ Equivalent of JobTracker & TaskTracker in v1.0
❏ ResourceManager (RM):
■ Runs usually at NameNode; Distributes resources among applications.
■ Two main components: Scheduler and ApplicationsManager (AM)
❏ NodeManager (NM):
■ Per-node framework agent
■ Responsible for containers
■ Monitors their resource usage
■ Reports the stats to RM
Central ResourceManager and Node specific Manager together is called YARN
14
15
→ Hadoop 1.0 vs. 2.0 ←
❏ HDFS 1.0:
■ Single point of failure
■ Horizontal scaling performance issue
❏ HDFS 2.0:
■ HDFS High Availability
■ HDFS Snapshot
■ Improved performance
■ HDFS Federation
16
17
HDFS Federation
→ Interacting with HDFS ←
❏ Command prompt:
■ Similar to Linux terminal commands
■ Unix is the model, POSIX is the API
❏ Web Interface:
■ Similar to browsing a FTP site on web
18
Interacting With HDFS
On Command Prompt
19
→ Notes ←
File Paths on HDFS:
■ hdfs://127.0.0.1:8020/user/USERNAME/demo/data/file.txt
■ hdfs://localhost:8020/user/USERNAME/demo/data/file.txt
■ /user/USERNAME/demo/file.txt
■ demo/file.txt
File System:
■ Local: local file system (linux)
■ HDFS: hadoop file system
At some places:
The terms “file” and “directory” has the same meaning.
20
→ Before we start ←
❏ Command:
■ hdfs
❏ Usage:
■ hdfs [--config confdir] COMMAND
❏ Example:
■ hdfs dfs
■ hdfs dfsadmin
■ hdfs fsck
■ hdfs namenode
■ hdfs datanode
21
hdfs `dfs` commands
22
→ In general Syntax for `dfs` commands ←
hdfs
dfs
-<COMMAND>
-[OPTIONS]
<PARAMETERS>
e.g.
hdfs dfs -ls -R /user/USERNAME/demo/data/
23
0. Do It yourself
❏ Syntax:
■ hdfs dfs -help [COMMAND … ]
■ hdfs dfs -usage [COMMAND … ]
❏ Example:
■ hdfs dfs -help cat
■ hdfs dfs -usage cat
24
1. List the file/directory
❏ Syntax:
■ hdfs dfs -ls [-d] [-h] [-R] <hdfs-dir-path>
❏ Example:
■ hdfs dfs -ls
■ hdfs dfs -ls /
■ hdfs dfs -ls /user/USERNAME/demo/list-dir-example
■ hdfs dfs -ls -R /user/USERNAME/demo/list-dir-example
25
2. Creating a directory
❏ Syntax:
■ hdfs dfs -mkdir [-p] <hdfs-dir-path>
❏ Example:
■ hdfs dfs -mkdir /user/USERNAME/demo/create-dir-example
■ hdfs dfs -mkdir -p /user/USERNAME/demo/create-dir-
example/dir1/dir2/dir3
26
3. Create a file on local & put it on HDFS
❏ Syntax:
■ vi filename.txt
■ hdfs dfs -put [options] <local-file-path> <hdfs-dir-path>
❏ Example:
■ vi file-copy-to-hdfs.txt
■ hdfs dfs -put file-copy-to-hdfs.txt /user/USERNAME/demo/put-
example/
27
4. Get a file from HDFS to local
❏ Syntax:
■ hdfs dfs -get <hdfs-file-path> [local-dir-path]
❏ Example:
■ hdfs dfs -get /user/USERNAME/demo/get-example/file-copy-from-
hdfs.txt ~/demo/
28
5. Copy From LOCAL To HDFS
❏ Syntax:
■ hdfs dfs -copyFromLocal <local-file-path> <hdfs-file-path>
❏ Example:
■ hdfs dfs -copyFromLocal file-copy-to-hdfs.txt
/user/USERNAME/demo/copyFromLocal-example/
29
6. Copy To LOCAL From HDFS
❏ Syntax:
■ hdfs dfs -copyToLocal <hdfs-file-path> <local-file-path>
❏ Example:
■ hdfs dfs -copyToLocal /user/USERNAME/demo/copyToLocal-
example/file-copy-from-hdfs.txt ~/demo/
30
7. Move a file from local to HDFS
❏ Syntax:
■ hdfs dfs -moveFromLocal <local-file-path> <hdfs-dir-path>
❏ Example:
■ hdfs dfs -moveFromLocal /path/to/file.txt
/user/USERNAME/demo/moveFromLocal-example/
31
8. Copy a file within HDFS
❏ Syntax:
■ hdfs dfs -cp <hdfs-source-file-path> <hdfs-dest-file-path>
❏ Example:
■ hdfs dfs -cp /user/USERNAME/demo/copy-within-hdfs/file-copy.txt
/user/USERNAME/demo/data/
32
9. Move a file within HDFS
❏ Syntax:
■ hdfs dfs -mv <hdfs-source-file-path> <hdfs-dest-file-path>
❏ Example:
■ hdfs dfs -mv /user/USERNAME/demo/move-within-hdfs/file-move.txt
/user/USERNAME/demo/data/
33
10. Merge files on HDFS
❏ Syntax:
■ hdfs dfs -getmerge [-nl] <hdfs-dir-path> <local-file-path>
❏ Examples:
■ hdfs dfs -getmerge -nl /user/USERNAME/demo/merge-example/
/path/to/all-files.txt
34
11. View file contents
❏ Syntax:
■ hdfs dfs -cat <hdfs-file-path>
■ hdfs dfs -tail <hdfs-file-path>
■ hdfs dfs -text <hdfs-file-path>
❏ Examples:
■ hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt
■ hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt | head
35
12. Remove files/dirs from HDFS
❏ Syntax:
■ hdfs dfs -rm [options] <hdfs-file-path>
❏ Examples:
■ hdfs dfs -rm /user/USERNAME/demo/remove-example/remove-file.txt
■ hdfs dfs -rm -R /user/USERNAME/demo/remove-example/
■ hdfs dfs -rm -R -skipTrash /user/USERNAME/demo/remove-example/
36
13. Change file/dir properties
❏ Syntax:
■ hdfs dfs -chgrp [-R] <NewGroupName> <hdfs-file-path>
■ hdfs dfs -chmod [-R] <permissions> <hdfs-file-path>
■ hdfs dfs -chown [-R] <NewOwnerName> <hdfs-file-path>
❏ Examples:
■ hdfs dfs -chmod -R 777 /user/USERNAME/demo/data/file-change-
properties.txt
37
14. Check the file size
❏ Syntax:
■ hdfs dfs -du <hdfs-file-path>
❏ Examples:
■ hdfs dfs -du /user/USERNAME/demo/data/file.txt
■ hdfs dfs -du -s -h /user/USERNAME/demo/data/
38
15. Create a zero byte file in HDFS
❏ Syntax:
■ hdfs dfs -touchz <hdfs-file-path>
❏ Examples:
■ hdfs dfs -touchz /user/USERNAME/demo/data/zero-byte-file.txt
39
16. File test operations
❏ Syntax:
■ hdfs dfs -test -[defsz] <hdfs-file-path>
❏ Examples:
■ hdfs dfs -test -e /user/USERNAME/demo/data/file.txt
❏ echo $?
40
17. Get FileSystem Statistics
❏ Syntax:
■ hdfs dfs -stat [format] <hdfs-file-path>
❏ Format Options:
■ %b - file size in blocks, %g - group name of owner
■ %n - filename %o - block size
■ %r - replication %u - user name of owner
■ %y - modification date
41
18. Get File/Dir Counts
❏ Syntax:
■ hdfs dfs -count [-q] [-h] [-v] <hdfs-file-path>
❏ Example:
■ hdfs dfs -count -v /user/USERNAME/demo/
42
19. Set replication factor
❏ Syntax:
■ hdfs dfs -setrep -w -R n <hdfs-file-path>
❏ Examples:
■ hdfs dfs -setrep -w -R 2 /user/USERNAME/demo/data/file.txt
43
20. Set Block Size
❏ Syntax:
■ hdfs dfs -D dfs.blocksize=blocksize -copyFromLocal <local-file-path>
<hdfs-file-path>
❏ Examples:
■ hdfs dfs -D dfs.blocksize=67108864 -copyFromLocal /path/to/file.txt
/user/USERNAME/demo/block-example/
44
21. Empty the HDFS trash
❏ Syntax:
■ hdfs dfs -expunge
❏ Location:
45
Other hdfs commands (admin)
46
22. HDFS Admin Commands: fsck
❏ Syntax:
❏ hdfs fsck <hdfs-file-path>
❏ Options:
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks]]]
[-includeSnapshots]
47
48
23. HDFS Admin Commands: dfsadmin
❏ Syntax:
■ hdfs dfsadmin
❏ Options:
[-report [-live] [-dead] [-decommissioning]]
[-safemode enter | leave | get | wait]
[-refreshNodes]
[-refresh <host:ipc_port> <key> [arg1..argn]]
[-shutdownDatanode <datanode:port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-help [cmd]]
❏ Examples:
■ hdfs dfsadmin -report -live
49
50
24. HDFS Admin Commands: namenode
❏ Syntax:
■ hdfs namenode
❏ Options:
[-checkpoint] |
[-format [-clusterid cid ] [-force] [-nonInteractive] ] |
[-upgrade [-clusterid cid] ] |
[-rollback] |
[-recover [-force] ] |
[-metadataVersion ]
❏ Examples:
■ hdfs namenode -help
51
25. HDFS Admin Commands: getconf
❏ Syntax:
■ hdfs getconf [-options]
❏ Options:
[ -namenodes ] [ -secondaryNameNodes ]
[ -backupNodes ] [ -includeFile ]
[ -excludeFile ] [ -nnRpcAddresses ]
[ -confKey [key] ]
52
Again,,, THE most important command !!
❏ Syntax:
■ hdfs dfs -help [options]
■ hdfs dfs -usage [options]
❏ Examples:
■ hdfs dfs -help help
■ hdfs dfs -usage usage
53
Interacting With HDFS
In Web Browser
54
Web HDFS
URL:
http://namenode:50070/explorer.html
Examples:
http://localhost:50070/explorer.html
http://ec2-52-23-214-111.compute-1.amazonaws.com:50070/explorer.html
55
References
1. http://www.hadoopinrealworld.com
2. http://www.slideshare.net/sanjeeb85/hdfscommandreference
3. http://www.slideshare.net/jaganadhg/hdfs-10509123
4. http://www.slideshare.net/praveenbhat2/adv-os-presentation
5. http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html
6. http://www.snia.org/sites/default/files/Hadoop2_New_And_Noteworthy_SNIA_v3.pdf
7. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
hdfs/HDFSCommands.html
8. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
common/FileSystemShell.html
9. http://hadoop.apache.org/docs/r1.2.1/distcp.html
56
Thank You!!
Please send your questions at:
pradeep@datatorrent.com
pradeep.n.kumbhar@gmail.com
57
© 2016 DataTorrent
Resources
58
• Apache Apex website - http://apex.incubator.apache.org/
• Subscribe - http://apex.incubator.apache.org/community.html
• Download - http://apex.incubator.apache.org/downloads.html
• Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex
• Facebook - https://www.facebook.com/ApacheApex/
• Meetup - http://www.meetup.com/topics/apache-apex
• Startup Program – Free Enterprise License for Startups, Educational Institutions,
Non-Profits - https://www.datatorrent.com/product/startup-accelerator/
• Cloud Trial - http://web.datatorrent.com/cloudtrial.html
© 2016 DataTorrent
We Are Hiring
59
• jobs@datatorrent.com
• Developers/Architects
• QA Automation Developers
• Information Developers
• Build and Release
© 2016 DataTorrent
Upcoming Events
60
• March 15th
– …
• March 17th
6pm PST – Title
• March 24th
9am PST – Title
• …
APPENDIX
61
Copy data from one node to another node in HDFS
❏ Description:
❏ Copy data between clusters
❏ Syntax:
■ hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo
■ hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs:
//nn2:8020/bar/foo
■ hadoop distcp -f hdfs://nn1:8020/srclist.file hdfs://nn2:8020/bar/foo
Where srclist.file contains
■ hdfs://nn1:8020/foo/a
■ hdfs://nn1:8020/foo/b
62

Interacting with hdfs

  • 1.
    HADOOP Interacting with HDFS 1 ForUniversity Program on Apache Hadoop & Apache Apex
  • 2.
    → What's the“Need” ? ← ❏ Big data Ocean ❏ Expensive hardware ❏ Frequent Failures and Difficult recovery ❏ Scaling up with more machines 2
  • 3.
    → Hadoop ← ❏Open source software - a Java framework - initial release: December 10, 2011 ❏ It provides both, ❏ Storage → [HDFS] ❏ Processing → [MapReduce] ❏ HDFS: Hadoop Distributed File System 3
  • 4.
    → How Hadoopaddresses the need? ← ❏ Big data Ocean ■ Have multiple machines. Each will store some portion of data, not the entire data. ❏ Expensive hardware ■ Use commodity hardware. Simple and cheap. ❏ Frequent Failures and Difficult recovery ■ Have multiple copies of data. Have the copies in different machines. ❏ Scaling up with more machines ■ If more processing is needed, add new machines on the fly 4
  • 5.
    → HDFS ← ❏Runs on Commodity hardware: Doesn't require expensive machines ❏ Large Files; Write-once, Read-many (WORM) ❏ Files are split into blocks ❏ Actual blocks go to DataNodes ❏ The metadata is stored at NameNode ❏ Replicate blocks to different node ❏ Default configuration: ■ Block size = 128MB ■ Replication Factor = 3 5
  • 6.
  • 7.
  • 8.
  • 9.
    → Where NOTTO use HDFS ← ❏ Low latency data access ■ HDFS is optimized for high throughput of data at the expense of latency. ❏ Large number of small files ■ Namenode has the entire file-system metadata in memory. ■ Too much metadata as compared to actual data. ❏ Multiple writers / Arbitrary file modifications ■ No support for multiple writers for a file ■ Always append to end of a file 9
  • 10.
    → Some KeyConcepts ← ❏ NameNode ❏ DataNodes ❏ JobTracker ❏ TaskTrackers ❏ ResourceManager (MRv2) ❏ NodeManager (MRv2) ❏ ApplicationMaster (MRv2) 10
  • 11.
    → NameNode &DataNodes ← ❏ NameNode: ■ Centerpiece of HDFS: The Master ■ Only stores the block metadata: block-name, block-location etc. ■ Critical component; When down, whole cluster is considered down; Single point of failure ■ Should be configured with higher RAM ❏ DataNode: ■ Stores the actual data: The Slave ■ In constant communication with NameNode ■ When down, it does not affect the availability of data/cluster ■ Should be configured with higher disk space ❏ SecondaryNameNode: ■ Doesn't actually act as a NameNode ■ Stores the image of primary NameNode at certain checkpoint ■ Used as backup to restore NameNode 11
  • 12.
  • 13.
    → JobTracker &TaskTrackers ← ❏ JobTracker: ■ Talks to the NameNode to determine location of the data ■ Monitors all TaskTrackers and submits status of the job back to the client ■ When down, HDFS is still functional; no new MR job; existing jobs halted ■ Replaced by ResourceManager/ApplicationMaster in MRv2 ❏ TaskTracker: ■ Runs on all DataNodes ■ TaskTracker communicates with JobTracker signaling the task progress ■ TaskTracker failure is not considered fatal ■ Replaced by NodeManager in MRv2 13
  • 14.
    → ResourceManager &NodeManager ← ❏ Present in Hadoop v2.0 ❏ Equivalent of JobTracker & TaskTracker in v1.0 ❏ ResourceManager (RM): ■ Runs usually at NameNode; Distributes resources among applications. ■ Two main components: Scheduler and ApplicationsManager (AM) ❏ NodeManager (NM): ■ Per-node framework agent ■ Responsible for containers ■ Monitors their resource usage ■ Reports the stats to RM Central ResourceManager and Node specific Manager together is called YARN 14
  • 15.
  • 16.
    → Hadoop 1.0vs. 2.0 ← ❏ HDFS 1.0: ■ Single point of failure ■ Horizontal scaling performance issue ❏ HDFS 2.0: ■ HDFS High Availability ■ HDFS Snapshot ■ Improved performance ■ HDFS Federation 16
  • 17.
  • 18.
    → Interacting withHDFS ← ❏ Command prompt: ■ Similar to Linux terminal commands ■ Unix is the model, POSIX is the API ❏ Web Interface: ■ Similar to browsing a FTP site on web 18
  • 19.
    Interacting With HDFS OnCommand Prompt 19
  • 20.
    → Notes ← FilePaths on HDFS: ■ hdfs://127.0.0.1:8020/user/USERNAME/demo/data/file.txt ■ hdfs://localhost:8020/user/USERNAME/demo/data/file.txt ■ /user/USERNAME/demo/file.txt ■ demo/file.txt File System: ■ Local: local file system (linux) ■ HDFS: hadoop file system At some places: The terms “file” and “directory” has the same meaning. 20
  • 21.
    → Before westart ← ❏ Command: ■ hdfs ❏ Usage: ■ hdfs [--config confdir] COMMAND ❏ Example: ■ hdfs dfs ■ hdfs dfsadmin ■ hdfs fsck ■ hdfs namenode ■ hdfs datanode 21
  • 22.
  • 23.
    → In generalSyntax for `dfs` commands ← hdfs dfs -<COMMAND> -[OPTIONS] <PARAMETERS> e.g. hdfs dfs -ls -R /user/USERNAME/demo/data/ 23
  • 24.
    0. Do Ityourself ❏ Syntax: ■ hdfs dfs -help [COMMAND … ] ■ hdfs dfs -usage [COMMAND … ] ❏ Example: ■ hdfs dfs -help cat ■ hdfs dfs -usage cat 24
  • 25.
    1. List thefile/directory ❏ Syntax: ■ hdfs dfs -ls [-d] [-h] [-R] <hdfs-dir-path> ❏ Example: ■ hdfs dfs -ls ■ hdfs dfs -ls / ■ hdfs dfs -ls /user/USERNAME/demo/list-dir-example ■ hdfs dfs -ls -R /user/USERNAME/demo/list-dir-example 25
  • 26.
    2. Creating adirectory ❏ Syntax: ■ hdfs dfs -mkdir [-p] <hdfs-dir-path> ❏ Example: ■ hdfs dfs -mkdir /user/USERNAME/demo/create-dir-example ■ hdfs dfs -mkdir -p /user/USERNAME/demo/create-dir- example/dir1/dir2/dir3 26
  • 27.
    3. Create afile on local & put it on HDFS ❏ Syntax: ■ vi filename.txt ■ hdfs dfs -put [options] <local-file-path> <hdfs-dir-path> ❏ Example: ■ vi file-copy-to-hdfs.txt ■ hdfs dfs -put file-copy-to-hdfs.txt /user/USERNAME/demo/put- example/ 27
  • 28.
    4. Get afile from HDFS to local ❏ Syntax: ■ hdfs dfs -get <hdfs-file-path> [local-dir-path] ❏ Example: ■ hdfs dfs -get /user/USERNAME/demo/get-example/file-copy-from- hdfs.txt ~/demo/ 28
  • 29.
    5. Copy FromLOCAL To HDFS ❏ Syntax: ■ hdfs dfs -copyFromLocal <local-file-path> <hdfs-file-path> ❏ Example: ■ hdfs dfs -copyFromLocal file-copy-to-hdfs.txt /user/USERNAME/demo/copyFromLocal-example/ 29
  • 30.
    6. Copy ToLOCAL From HDFS ❏ Syntax: ■ hdfs dfs -copyToLocal <hdfs-file-path> <local-file-path> ❏ Example: ■ hdfs dfs -copyToLocal /user/USERNAME/demo/copyToLocal- example/file-copy-from-hdfs.txt ~/demo/ 30
  • 31.
    7. Move afile from local to HDFS ❏ Syntax: ■ hdfs dfs -moveFromLocal <local-file-path> <hdfs-dir-path> ❏ Example: ■ hdfs dfs -moveFromLocal /path/to/file.txt /user/USERNAME/demo/moveFromLocal-example/ 31
  • 32.
    8. Copy afile within HDFS ❏ Syntax: ■ hdfs dfs -cp <hdfs-source-file-path> <hdfs-dest-file-path> ❏ Example: ■ hdfs dfs -cp /user/USERNAME/demo/copy-within-hdfs/file-copy.txt /user/USERNAME/demo/data/ 32
  • 33.
    9. Move afile within HDFS ❏ Syntax: ■ hdfs dfs -mv <hdfs-source-file-path> <hdfs-dest-file-path> ❏ Example: ■ hdfs dfs -mv /user/USERNAME/demo/move-within-hdfs/file-move.txt /user/USERNAME/demo/data/ 33
  • 34.
    10. Merge fileson HDFS ❏ Syntax: ■ hdfs dfs -getmerge [-nl] <hdfs-dir-path> <local-file-path> ❏ Examples: ■ hdfs dfs -getmerge -nl /user/USERNAME/demo/merge-example/ /path/to/all-files.txt 34
  • 35.
    11. View filecontents ❏ Syntax: ■ hdfs dfs -cat <hdfs-file-path> ■ hdfs dfs -tail <hdfs-file-path> ■ hdfs dfs -text <hdfs-file-path> ❏ Examples: ■ hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt ■ hdfs dfs -cat /user/USERNAME/demo/data/cat-example.txt | head 35
  • 36.
    12. Remove files/dirsfrom HDFS ❏ Syntax: ■ hdfs dfs -rm [options] <hdfs-file-path> ❏ Examples: ■ hdfs dfs -rm /user/USERNAME/demo/remove-example/remove-file.txt ■ hdfs dfs -rm -R /user/USERNAME/demo/remove-example/ ■ hdfs dfs -rm -R -skipTrash /user/USERNAME/demo/remove-example/ 36
  • 37.
    13. Change file/dirproperties ❏ Syntax: ■ hdfs dfs -chgrp [-R] <NewGroupName> <hdfs-file-path> ■ hdfs dfs -chmod [-R] <permissions> <hdfs-file-path> ■ hdfs dfs -chown [-R] <NewOwnerName> <hdfs-file-path> ❏ Examples: ■ hdfs dfs -chmod -R 777 /user/USERNAME/demo/data/file-change- properties.txt 37
  • 38.
    14. Check thefile size ❏ Syntax: ■ hdfs dfs -du <hdfs-file-path> ❏ Examples: ■ hdfs dfs -du /user/USERNAME/demo/data/file.txt ■ hdfs dfs -du -s -h /user/USERNAME/demo/data/ 38
  • 39.
    15. Create azero byte file in HDFS ❏ Syntax: ■ hdfs dfs -touchz <hdfs-file-path> ❏ Examples: ■ hdfs dfs -touchz /user/USERNAME/demo/data/zero-byte-file.txt 39
  • 40.
    16. File testoperations ❏ Syntax: ■ hdfs dfs -test -[defsz] <hdfs-file-path> ❏ Examples: ■ hdfs dfs -test -e /user/USERNAME/demo/data/file.txt ❏ echo $? 40
  • 41.
    17. Get FileSystemStatistics ❏ Syntax: ■ hdfs dfs -stat [format] <hdfs-file-path> ❏ Format Options: ■ %b - file size in blocks, %g - group name of owner ■ %n - filename %o - block size ■ %r - replication %u - user name of owner ■ %y - modification date 41
  • 42.
    18. Get File/DirCounts ❏ Syntax: ■ hdfs dfs -count [-q] [-h] [-v] <hdfs-file-path> ❏ Example: ■ hdfs dfs -count -v /user/USERNAME/demo/ 42
  • 43.
    19. Set replicationfactor ❏ Syntax: ■ hdfs dfs -setrep -w -R n <hdfs-file-path> ❏ Examples: ■ hdfs dfs -setrep -w -R 2 /user/USERNAME/demo/data/file.txt 43
  • 44.
    20. Set BlockSize ❏ Syntax: ■ hdfs dfs -D dfs.blocksize=blocksize -copyFromLocal <local-file-path> <hdfs-file-path> ❏ Examples: ■ hdfs dfs -D dfs.blocksize=67108864 -copyFromLocal /path/to/file.txt /user/USERNAME/demo/block-example/ 44
  • 45.
    21. Empty theHDFS trash ❏ Syntax: ■ hdfs dfs -expunge ❏ Location: 45
  • 46.
  • 47.
    22. HDFS AdminCommands: fsck ❏ Syntax: ❏ hdfs fsck <hdfs-file-path> ❏ Options: [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] [-includeSnapshots] 47
  • 48.
  • 49.
    23. HDFS AdminCommands: dfsadmin ❏ Syntax: ■ hdfs dfsadmin ❏ Options: [-report [-live] [-dead] [-decommissioning]] [-safemode enter | leave | get | wait] [-refreshNodes] [-refresh <host:ipc_port> <key> [arg1..argn]] [-shutdownDatanode <datanode:port> [upgrade]] [-getDatanodeInfo <datanode_host:ipc_port>] [-help [cmd]] ❏ Examples: ■ hdfs dfsadmin -report -live 49
  • 50.
  • 51.
    24. HDFS AdminCommands: namenode ❏ Syntax: ■ hdfs namenode ❏ Options: [-checkpoint] | [-format [-clusterid cid ] [-force] [-nonInteractive] ] | [-upgrade [-clusterid cid] ] | [-rollback] | [-recover [-force] ] | [-metadataVersion ] ❏ Examples: ■ hdfs namenode -help 51
  • 52.
    25. HDFS AdminCommands: getconf ❏ Syntax: ■ hdfs getconf [-options] ❏ Options: [ -namenodes ] [ -secondaryNameNodes ] [ -backupNodes ] [ -includeFile ] [ -excludeFile ] [ -nnRpcAddresses ] [ -confKey [key] ] 52
  • 53.
    Again,,, THE mostimportant command !! ❏ Syntax: ■ hdfs dfs -help [options] ■ hdfs dfs -usage [options] ❏ Examples: ■ hdfs dfs -help help ■ hdfs dfs -usage usage 53
  • 54.
  • 55.
  • 56.
    References 1. http://www.hadoopinrealworld.com 2. http://www.slideshare.net/sanjeeb85/hdfscommandreference 3.http://www.slideshare.net/jaganadhg/hdfs-10509123 4. http://www.slideshare.net/praveenbhat2/adv-os-presentation 5. http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html 6. http://www.snia.org/sites/default/files/Hadoop2_New_And_Noteworthy_SNIA_v3.pdf 7. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- hdfs/HDFSCommands.html 8. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- common/FileSystemShell.html 9. http://hadoop.apache.org/docs/r1.2.1/distcp.html 56
  • 57.
    Thank You!! Please sendyour questions at: pradeep@datatorrent.com pradeep.n.kumbhar@gmail.com 57
  • 58.
    © 2016 DataTorrent Resources 58 •Apache Apex website - http://apex.incubator.apache.org/ • Subscribe - http://apex.incubator.apache.org/community.html • Download - http://apex.incubator.apache.org/downloads.html • Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex • Facebook - https://www.facebook.com/ApacheApex/ • Meetup - http://www.meetup.com/topics/apache-apex • Startup Program – Free Enterprise License for Startups, Educational Institutions, Non-Profits - https://www.datatorrent.com/product/startup-accelerator/ • Cloud Trial - http://web.datatorrent.com/cloudtrial.html
  • 59.
    © 2016 DataTorrent WeAre Hiring 59 • jobs@datatorrent.com • Developers/Architects • QA Automation Developers • Information Developers • Build and Release
  • 60.
    © 2016 DataTorrent UpcomingEvents 60 • March 15th – … • March 17th 6pm PST – Title • March 24th 9am PST – Title • …
  • 61.
  • 62.
    Copy data fromone node to another node in HDFS ❏ Description: ❏ Copy data between clusters ❏ Syntax: ■ hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo ■ hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs: //nn2:8020/bar/foo ■ hadoop distcp -f hdfs://nn1:8020/srclist.file hdfs://nn2:8020/bar/foo Where srclist.file contains ■ hdfs://nn1:8020/foo/a ■ hdfs://nn1:8020/foo/b 62