This document discusses web services in Hadoop, including RESTful APIs that provide programmatic access to Hadoop components like HDFS, HCatalog, and job submission/monitoring. It describes the design goals of WebHDFS including supporting HTTP, high performance, cross-version compatibility, and security. Examples are given of using curl and wget to interact with HDFS files via WebHDFS URLs. The HCatalog REST API is also summarized, which allows creating, querying and managing Hadoop metadata. Finally, future work is mentioned around improving job management and authentication.
4. HDFS Clients
• DFSClient: the native client
– High performance (using RPC)
– Java blinding
• libhdfs: a C++ client interface
– Using JNI => large overhead
– Also Java blinding (require Hadoop installing)
Architecting the Future of Big Data Page 4
5. HFTP
• Designed for cross-version copying (DistCp)
– High performance (using HTTP)
– Read-only
– The HTTP API is proprietary
– Clients must use HftpFileSystem (hftp://)
• WebHDFS is a rewrite of HFTP
Architecting the Future of Big Data Page 5
6. Design Goals
• Support a public HTTP API
• Support Read and Write
• High Performance
• Cross-version
• Security
Architecting the Future of Big Data Page 6
7. WebHDFS features
• HTTP REST API
– Defines a public API
– Permits non-Java client implementation
– Support common tools like curl/wget
• Wire Compatibility
– The REST API will be maintained for wire compatibility
– WebHDFS clients can talk to different Hadoop versions.
Architecting the Future of Big Data Page 7
8. WebHDFS features (2)
• A Complete HDFS Interface
– Support all user operations
– reading files
– writing to files
– mkdir, chmod, chown, mv, rm, …
• High Performance
– Using HTTP redirection to provide data locality
– File read/write are redirected to the corresponding
datanodes
Architecting the Future of Big Data Page 8
9. WebHDFS features (3)
• Secure Authentication
– Same as Hadoop authentication: Kerberos (SPNEGO)
and Hadoop delegation tokens
– Support proxy users
• A HDFS Built-in Component
– WebHDFS is a first class built-in component of HDFS.
– Run inside Namenodes and Datanodes
• Apache Open Source
– Available in Apache Hadoop 1.0 and above.
Architecting the Future of Big Data Page 9
10. WebHDFS URI & URL
• FileSystem scheme:
webhdfs://
• FileSystem URI:
webhdfs://<HOST>:<HTTP_PORT>/<PATH>
• HTTP URL:
http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=..
– Path prefix: /webhdfs/v1
– Query: ?op=..
Architecting the Future of Big Data Page 10
11. URI/URL Examples
• Suppose we have the following file
hdfs://namenode:8020/user/szetszwo/w.txt
• WebHDFS FileSystem URI
webhdfs://namenode:50070/user/szetszwo/w.txt
• WebHDFS HTTP URL
http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=..
• WebHDFS HTTP URL to open the file
http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN
Architecting the Future of Big Data Page 11
12. Example: curl
• Use curl to open a file
$curl -i -L "http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN"
HTTP/1.1 307 TEMPORARY_REDIRECT
Content-Type: application/octet-stream
Location: http://192.168.5.2:50075/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN&offset=0
Content-Length: 0
Server: Jetty(6.1.26)
Architecting the Future of Big Data Page 12
13. Example: curl (2)
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 21
Server: Jetty(6.1.26)
Hello, WebHDFS user!
Architecting the Future of Big Data Page 13
14. Example: wget
• Use wget to open the same file
$wget "http://namenode:50070/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN" –O w.txt
Resolving ...
Connecting to ... connected.
HTTP request sent, awaiting response...
307 TEMPORARY_REDIRECT
Location: http://192.168.5.2:50075/webhdfs/v1/user/
szetszwo/w.txt?op=OPEN&offset=0 [following]
Architecting the Future of Big Data Page 14
15. Example: wget (2)
--2012-06-13 01:42:10-- http://192.168.5.2:50075/
webhdfs/v1/user/szetszwo/w.txt?op=OPEN&offset=0
Connecting to 192.168.5.2:50075... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21 [application/octet-stream]
Saving to: `w.txt'
100%[=================>] 21 --.-K/s in 0s
2012-06-13 01:42:10 (3.34 MB/s) - `w.txt' saved
[21/21]
Architecting the Future of Big Data Page 15