3. Architecture of W
W
W
● The WWW today is a distributed client-server service, in which a client
using a browser can access a service using a server. However, the
service provided is distributed over many locations called sites.
● Each site holds one or more documents, referred to as Web pages.
Each Web page, however, can contain some links to other Web pages
in the same or other sites. Each Web page is a file with a name and
address.
● A Web page can be simple or composite.
○ A simple Web page has no link to other Web pages
○ A composite Web page has one or more links to other Web
pages.
4. Simple webpage
Documents of the
webpage are in the single
file .It can be retrieved
using one single request/
response transaction
5. Composite webpage
The main document and
the contents are stored in
two separate files in the
same site (file A and file
B); the referenced text file
is stored in another site
(file C). Since we are
dealing with three
different files, we need
three transactions if we
want to see the whole
document.
6. Hypertext and Hypermedia
● Hypertext are documents that refer to
other documents.
another
document, a
defined as
document.
● Hypermedia
In a hypertext
part of text can be
a link to
is a term applied to
document that contains links to other
textual document or documents
containing graphics, video, or audio.
8. WEB CLIENT (Browser)
● A browsers interpret and display a Web document, and
all of the ventors use nearly the same architecture.
● Each browser usually consists of three parts:
○ A controller
○ Client protocol
○ Interpreters
9. The controller receives input from the keyboard or the mouse and uses
the client programs to access the document. After the document has been
accessed, the controller uses one of the interpreters to display the
document on the screen. The client protocol can be FTP, or TELNET, or
HTTP .The interpreter can be HTML, Java, or JavaScript,depending on the
type of document.Some commercial browsers include Internet Explorer,
Netscape Navigator, and Firefox
10. WebServer
● The Web page is stored at the server.
● Each time a client request arrives, the corresponding document is
sent to the client. To improve efficiency, servers normally store
requested files in a cache in memory; it is faster to access than disk.
● A server can also become more efficient through multithreading or
multiprocessing. In this case, a server can answer more than one
request at a time.
● Some popular Web servers include Apache and Microsoft Internet
Information Server.
11. Uniform Resource Locator (URL)
● A client that wants to access a Web page needs the file name
and the address.
● The uniform resource locator (URL) is a standard locator for
specifying any kind of information on the Internet. The URL
defines four things: protocol, host computer, port, and path
12. ● The documents in the WWW can be grouped into three broad
categories:
○ Static
○ Dynamic
○ Active
● The category is based on the time the contents of the document are
determined.
WEB DOCUMENTS
13. ● Static documents are fixed-content documents that are created and
stored in a server.The client can get a copy of the document only.
● Static documents are prepared using one of the several languages:
Hypertext Markup Language (HTML), Extensible Markup Language
(XML), Extensible Style Language (XSL), and Extended Hypertext
Markup Language (XHTML)
STATIC DOCUMENTS
14. ● A dynamic document is created by a Web server whenever a browser
requests the document.
● When a request arrives, the Web server runs an application program or
a script that creates the dynamic document. The server returns the
output of the program or script as a response to the browser that
requested the document.
● Because a fresh document is created for each request, the contents of
a dynamic document may vary from one request to another.
● eg:retrieval of the time and date from a server. Time and date are kinds
of information that are dynamic in that they change from moment to
moment.
DYNAMIC DOCUMENTS
15. ● The Common Gateway Interface (CGI) is a technology that creates and
handles dynamic documents. CGI is a set of standards that defines
how a dynamic document is written, how data are input to the
program, and how the output result is used.
● CGI is not a new language; instead, it allows programmers to use any
of several languages such as C, C++, Bourne Shell, Korn Shell, C
Shell, Tcl, or Perl. The only thing that CGI defines is a set of rules and
terms that the programmer must follow.
● If we use CGI, the program must create an entire document each time
a request is made
16. Scripting Technologies for Dynamic Documents :
● The problem with CGI technology is the inefficiency that results if part
of the dynamic document that is to be created is fixed and not
changing from request to request. If we use CGI, the program must
create an entire document each time a request is made.
● The solution is to create a file containing the fixed part of the document
using HTML and embed a script, a source code, that can be run by the
server to provide the varying data section
● Eg: Hypertext Preprocessor (PHP),Java Server Pages (JSP),Active
Server Pages (ASP),ColdFusion
17. ● For many applications, we need a program or a script to be run at the
client site. These are called active documents.
● For example, suppose we want to run a program that creates
animated graphics on the screen or a program that interacts with the
user. When a browser requests an active document, the server sends
a copy of the document or a script. The document is then run at the
client (browser) site.
● Eg:java applet ,java script
Active Documents
19. ● The Hypertext Transfer Protocol (HTTP) is a Application layer protocol
used mainly to access data on the World Wide Web.
● The data transferred between the client and the server look like
SMTP messages.The format of the messages is controlled by
MIME-like headers.
● Unlike SMTP, the HTTP messages are not destined to be read by
humans; they are read and interpreted by the HTTP server and HTTP
client (browser).
20. ● SMTP messages are stored and forwarded, but HTTP
messages are delivered immediately.
● The commands from the client to the server are embedded in a
request message.The contents of the requested file or other
information are embedded in a response message.
● HTTP uses the services of TCP on well-known port 80.
21. HTTP Transaction:AlthoughHTTP uses the services of TCP,
HTTP itself is a stateless protocol, which means that the server
does not keep information about the client. The client initializes the
transaction by sending a request. The server replies by sending a
response
23. 1. Request Line :
● The first line in a request message is called a request line. There are
three fields in this line separated by some character delimiter .
● The fields are called
○ Methods
○ URL
○ Version.
● These three should be separated by a space character. At the end
two characters, a carriage return followed by a line feed terminate
the line.
24. a. HTTP METHODS : HTTP request methods specify the
action to perform through the request.
Put some content in the specified URL
25. b. The requested URL
The URL typically functions as a name for the resource being requested,
together with an optional query string containing parameters that the
client is passing to that resource.
c. The HTTP version being used
The only HTTP versions in common use on the Internet are 1.0 and 1.1,
and most browsers use version 1.1 by default.
26. 2. Header Line :
● After the request line, we can have zero or more request header
lines.
● Each header line sends additional information from the client to the
server.
● For example, the client can request that the document be sent in a
special format.Using the request header, the client can send
additional information to the server about the request as well as the
client itself.
27. There can be either zero or more headers in the request :
30. 1.The Status line
● Status Line of every HTTP response consists of three items,
separated by spaces:
○ The HTTP version being used.
○ A numeric status code indicating the result of the request.
■ 200 is the most common status code; it means that the
request was successful and that the requested resource is
being returned.
○ A textual “reason phrase” further describing the status of the
response. This can have any value and is not used for any
purpose by current browsers.
31.
32.
33. 2. Header Lines InResponse Message:After the status
line, we can have zero or more response header lines. Each
header line sends additional information from the server to the
client
34. 3. Body
● The body contains the document to be
sent from the server to the client. The
body is present unless the response is
an error message.
● Consists
requested
of the resource data
by the client. eg: we
requested the book’s data, and the
response body consists of the different
books present in the database along
with their information.
35.
36.
37. Persistence
HTTP, prior to version 1.1, specified a nonpersistent connection, while a
persistent connection is the default in version 1.1.
a. Nonpersistent Connection:
In a nonpersistentconnection,one TCP connectionis made for each
request/response.
The following lists the steps in this strategy:
1. The client opens a TCP connection and sends a request.
2. The server sends the response and closes the connection.
3. The client reads the data until it encounters an end-of-file marker; it
then closes the connection.
38. b. Persistent Connection
● HTTP version 1.1 specifies a persistent connection by default. In a
persistent connection, the server leaves the connection open for
more requests after sending a response.
● The server can close the connection at the request of a client or if a
time-out has been reached.
39. Cookies:
● The World Wide Web was originally designed as a stateless entity. A
client sends a request; a server responds. Their relationship is over.
● Cookies are text files with small pieces of data that are used to
identify your computer as you use a computer network.
○ When a server receives a request from a client, it stores
information about the client in a file or a string.
○ The server includes the cookie in the response that it sends to the
client.
○ When the client receives the response, the browser stores the
cookie in the cookie directory, which is sorted by the domain
server name.
40. WebCaching: Proxy Server
● A proxy server is a computer that keeps copies of responses to
recent requests.
● The HTTP client sends a request to the proxy server. The proxy
server checks its cache. If the response is not stored in the cache,
the proxy server sends the request to the corresponding server.
● Incoming responses are sent to the proxy server and stored for
future requests from other clients.
41. Proxy Server Location
The proxy servers are normally located at the client site. This means that
we can have a hierarchy of proxy servers as shown below:
1. A client computer can also be used as a proxy server in a small
capacity that stores responses to requests often invoked by the
client.
2. In a company, a proxy server may be installed on the computer
LAN to reduce the load going out of and coming into the LAN.
3. An ISP with many customers can install a proxy server to reduce
the load going out of and coming into the ISP network.
42. HTTP SECURITY:
● The HTTP does not offer security.HTTP can be run over the Secure
Socket Layer (SSL) is referred to as HTTPS.
● HTTPS provides confidentiality, client and server authentication, and
data integrity
● SSL is the standard security technology for establishing an encrypted
link between the two systems. These can be browser to server, server
to server or client to server. Basically, SSL ensures that the data
transfer between the two systems remains encrypted and private.