A web service is a method of communication between two
electronic devices over a network.
The W3C defines a web service as a software system designed to support interoperable machine-to-machine interaction over a network.
Nowadays the web is full of web services: search engines, online stores, weblogs, wikis, calculators, games etc. and so it's full of data. Machines fetches these data from the programmable web to perform some tasks.
The programmable web is just as same as the Web that we, the humans, interacts with.
The main difference is that instead of arranging its data in attractive HTML pages with banner and ads and cute logos, the programmable web usually serves stark, brutal XML documents which are not necessarily for human consumption. It's data is intended as input to a software program that does some tasks.
At its best, the programmable web works the same way as the human web. The clients of it retrieves data from it and figure out what do they mean, and they can also modify the programmable web, just like the human web.
Kinds of Things on the Programmable Web
The programmable web is based on HTTP and XML. Some parts of it serves HTML, JSON, plain text or binary documents, but most parts use XML, and it's all based on HTTP.
There are basically two ways of classifying the services that inhabit the programmable web: by the technologies they use (URIs, SOAP, XML-RPC and so on), or by the underlying architectures and design philosophies.
Most of today's terminology sorts services by their superficial appearances: the technology they use. These classifications work in most cases, but they're conceptually lacking and lead to mistakes. It would be better if a taxonomy based on architecture is used, which shows how technology choices follow from underlying design principles.
HTTP – The Common Thing of the Programmable Web
To classify the programmable web, it's better to start off with an overview of HTTP, the protocol that all web services have in common.
HTTP is a document-based protocol, in which the client puts a document in an envelope and sends it to the server. The server returns the favor by putting a response document in an envelope and sending it to the client. The protocol defines a strict format for thie envelope, but it doesn't care what goes inside.
In HTTP terms, this envelope is called either a request or a response . When a client sends a request to a server, then it's called an HTTP Request, and when a response comes from the server, then this envelope is called an HTTP Response.
A Sample HTTP Request
A sample HTTP request header is given below -
GET /index.html HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12)...
The name of the HTTP method is like a method name in a programming language: it indicates how the client expects the server to process this envelope.
In the sample example, the method is GET , which means the client is expecting the server to get some information for it.
The HTTP standard defines a total of 9 methods. They are -
1. GET 6. TRACE
2. POST 7. OPTIONS
3. HEAD 8. CONNECT
4. PUT 9. PATCH ( rfc5789 )
Parts of a HTTP request - The path
This is the portion of the URI to the right of the hostname. In this example, it's /index.html.
In terms of the envelope metaphore, the path is the address on the envelope.
When combined with the hostname, it becomes something like http://www.oreilly.com/index.html , which uniquely specifies a resource on the web.
Parts of a HTTP request – The request headers
These are bits of metadata: key-value pairs that act like informational stickers slapped onto the envelope.
Our sample example request has 8 headers: Host, User-Agent, Accept, and so on.
There's a standard list of HTTP headers, and applications can define their own.
Parts of a HTTP request – The entity-body
It's also called the document or representation.
This is the document that is inside the envelope. This particular request has no entity-body, which means the envelope is empty.
This is typical for a GET request, where all the information needed to complete the request is in the path and the headers.
A Sample HTTP Response
A sample HTTP response is given below -
HTTP/1.1 200 OK
Date: Fri, 17 Nov 2006 15:36:32 GMT
Last-Modified: Fri, 17 Nov 2006 09:05:32 GMT
X-Cache: MISS from www.oreilly.com
Keep-Alive: timeout=15, max=1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN ......
Parts of HTTP Response – HTTP Response Code
It's a numeric code that tells the client whether the request went well or poorly, and how the client should regard this envelope and its contents.
In our example, the GET operation must have succeeded, since the response code is 200, which is interpreted as OK .
There are 5 categories of HTTP response code. They are -
4xx: Client Error
5xx: Server Error
Parts of HTTP Response – The entity-body and response headers
Again, the entity-body is the document inside the envelope. This time, the request has an entity body, which is the fulfillment of the GET request.
The response headers are the same as the request headers – they are just informational stickers slapped onto the envelope.
Our example response has a total of 11 response headers: Date, Server and so on.
Differences between web services
HTTP is the one thing that all services on the programmable web have in common. But there are two big questions that today's web services answer differently.
If we know how a web service answers these questions, then we will have a good idea of how well it works with the web
The First Question – Method Information
The first burning question is – how the client can convey its intentions to the server? How does the server know a certain request is a request to retrieve some data, instead of a request to delete that same data or to overwrite it with different data? Why should the server do this instead of doing that?
This information about what to do with the data can be called the Method Information .
One way to convey method information in a web service is to put it in the HTTP method. This is how RESTful web services do it.
The great advantage of HTTP method names is that they're standardized.
Some web services keep method information in the URI path or the request document.
For example, let us consider the web service for Flickr. When someone sends HTTP requests to it's search API , the server searches for Photos. The HTTP method being used here is GET .
But the flickr supports many methods, not just GET -type methods such as flickr.photos.addTags, flickr.photos.comments.deleteComment and so on. All of them are invoked with an HTTP GET request, regardless of whether or not they GET any data. So practically Flickr is sticking its method information in the URI. Similarly, SOAP services also don't put their method information in the HTTP method, instead they store it in the entity-body and in a HTTP header.
The second question is – how the client tells the server which part of the data set to operate on i.e., given that the server understands that the client wants to delete some data, how can it know which data the client wants to delete? Why should the server operate on this data instead of that data?
This information can be called as the Scoping Information . One obvious place to put it is in the URI path. That's what most web sites do.
For an example, the URI http://www.google.com/search?q=REST tells the server that the client wants to get a list of search results about REST. Here the method information is GET and the scoping information is /search?q=REST.
The Second Question: Scoping Information
Many web services put scoping information in the path. In a service where the method information defines a method in the programming language sense, the scoping information can be seen as a set of arguments to that method.
The alternative is to put the scoping information into the entity-body. A typical SOAP service does it this way.
Generally, the service design determines what information is method information and what information is scoping information.
Scoping information (Contd.)
Now that the two big questions that web services answer differently have been identified, we can group web services’s architecture into three different categories -
Types of web service architecture
A web service can be considered as RESTful if it follows the following constraints -
Client-server : Clients are separated from servers by a uniform interface.
Stateless : No client context being stored on the server between requests.
Cacheable : Clients are able to cache responses.
Layered system : A client cannot ordinarily tell whether it is connected directly to the end server, or to an intermediary along the way.
Uniform Interface : The uniform interface between clients and servers simplifies and decouples the architecture, which enables each part to evolve independently.
Code on demand (optional) : Servers are able to temporarily extend or customize the funtionality of a client by transferring logic to it that it can execute.
Type 1: RESTful resource-oriented web services
RESTful resource-oriented web services(Contd.)
In RESTful architectures, the method information goes into the HTTP method. In Resource-Oriented Architectures, the scoping information goes into the URI.
This combination is really powerful because given the first line of an HTTP request to a resource-oriented RESTful web service, one can easily understand basically what the client wants to do.
If the HTTP method doesn't match the method information, the service isn't RESTful. If the scoping information isn't in the URI, the service isn't resource-oriented. These aren't the only requirements, but they're good rules of thumb.
RESTful resource-oriented web services(Contd.)
A few well-known exaples of RESTful, resource-oriented web services include:
Services that expose the Atom Publishing Protocol
Amazon's Simple Storage Service(S3)
Most of yahoo's web services
Static web sites
Many web applications, especially read-only ones like search engines
Type 2: RPC-Style Architectures
An RPC-style web service accepts an envelope full of data from its client, and sends a similar envelope back. The method and the scoping information are kept inside the envelope, or on stickers applied to the envelope.
HTTP is a popular envelope format, and so is SOAP. Transmitting a SOAP document over HTTP puts the SOAP envelope inside an HTTP envelope.
In this architecture, every object doesn't necessarily respond to the same basic interface.
XML-RPC protocol for the web services is the most obvious example of the RPC architecture. In this protocol, the method data and the scoping data are put inside an XML document. This XML document becomes the entity-body inside the HTTP envelope.
In XML-RPC, the XML document containing method and scoping information is put into an envelope for transfer to the server. The envelope is an HTTP request with a method, URI and headers.
The XML document changes depending on which method someone is calling, but the HTTP envelope is always the same.
Where a RESTful service would expose different URIs for different values of the scoping information, an RPC-style service typically exposes a URI for each Document Processor : something that can open the envelopes and transform them into software commands.
Type 3: REST-RPC Hybrid Architectures
This is a term used for describing web services that fit somewhere in between the RESTful web services and the purely RPC-style services.
Despite the ”rest” in the URI, this was clearly designed as an RPC-style service, one that uses HTTP as its envelope format. It's got the scoping information in the URI, just like RESTful resource-oriented services, but the method information also goes in the URI. It gives the illusion of behaving like a RESTful web service, but it isn't.
RESTful Design Rules
The REST architectural style describes the following six constraints applied to the architecture, while leaving the implementation of the individual components free to design:
Code on Demand (optional)
Rule 1: Client-server
Clients are separated from servers by a uniform interface.
This separation of concerns means that, for example, clients are not concerned with data storage, which remains internal to each server, so that the portability of client code is improved.
Servers are not concerned with the user interface or user state, so that servers can be simpler and more scalable.
Servers and clients may also be replaced and developed independently, as long as the interface is not altered.
Rule 2: Stateless
The client–server communication is further constrained by no client context being stored on the server between requests.
Each request from any client contains all of the information necessary to service the request, and any session state is held in the client.
The server can be stateful; this constraint merely requires that server-side state be addressable by URL as a resource.
This not only makes servers more visible for monitoring, but also makes them more reliable in the face of partial network failures as well as further enhancing their scalability.
Rule 3: Cacheable
As on the World Wide Web, clients are able to cache responses.
Responses must therefore, implicitly or explicitly, define themselves as cacheable, or not, to prevent clients reusing stale or inappropriate data in response to further requests.
Well-managed caching partially or completely eliminates some client–server interactions, further improving scalability and performance.
Rule 4: Layered system
A client cannot ordinarily tell whether it is connected directly to the end server, or to an intermediary along the way.
Intermediary servers may improve system scalability by enabling load balancing and by providing shared caches. They may also enforce security policies.
Rule 5: Code on demand (optional)
Servers are able to temporarily extend or customize the functionality of a client by transferring logic to it that it can execute.
Rule 6: Uniform interface
The uniform interface between clients and servers, discussed below, simplifies and decouples the architecture, which enables each part to evolve independently. The four guiding principles of this interface are detailed below -
Identification of resources
Manipulation of resources through representations
Hypermedia as the engine of application state
Uniform interface Guideline 1: Identification of resources
Individual resources are identified in requests, for example using URIs in web-based REST systems.
The resources themselves are conceptually separate from the representations that are returned to the client.
For example, the server does not send its database, but rather, perhaps, send some HTML, XML or JSON that represents some database records expressed, for instance, in Bengali and encoded in UTF-8, depending on the details of the request and the server implementation.
Uniform interface Guideline 2: Manipulation of resources through these representations
When a client holds a representation of a resource, including any metadata attached, it has enough information to modify or delete the resource on the server, provided it has permission to do so.
Clients can modify the resource using a POST request, and to delete it, it can send a DELETE request.
Each message from the server includes enough information to describe how the message can be processed by the client.
For example, which parser to invoke may be specified by an Internet media type (previously known as a MIME type).
Responses also explicitly indicate their cacheability.
Uniform interface Guideline 4: Hypermedia as the engine of application state
Clients make state transitions only through actions that are dynamically identified within hypermedia by the server (e.g. by hyperlinks within hypertext).
But for simple fixed entry points to the application, a client does not assume that any particular actions will be available for any particular resources beyond those described in representations previously received from the server.