An Introduction to Cyber World to a Newbie.
Back in 1960’s, the internet that we use today was developed by the
contribution of several people. The initial idea is credited to Leonard
Kleinrock, a computer science professor at University of California, Los
Angeles (UCLA) after he published his first paper titled ”Information Flow
in Large Communication Nets” . Initially, the internet was not public, there
was a forerunner ARPAnet (Advanced Research Project Agency Networks).
ARPAnet created the TCP/IP communication standard which determines
the data transfer on Internet today.
What is a Protocol?
A protocol is a standardized means of communication among machines
across a network. These rules or set of established procedures determine
the format and transmission of data. Protocols allow data to be taken apart
for faster transmission, transmitted, and then reassembled at the
destination in the correct order.
Tim Berners-Lee was the man leading the development of the World Wide
Web (WWW), the defining of Hyper Text Markup Language (HTML) used to
create web pages, Hyper Text Transfer Protocol (HTTP) and the Universal
Resource Locators (URLs). All the developments took place around 1989
and 1991. Tim Berners-Lee is currently the Director of the World Wide Web
Consortium (W3C), the group that sets technical standards for the Web.
The term WWW is an acronym or abbreviation to the World Wide Web or
sometimes simply called as the Web. It is a system of all the resources
(such as FTP, telnet, Usenet) and users on the Internet servers that support
specifically formatted documents called the Web Pages including
hyperlinked text, audio, and video files, etc. that can be accessed and
searched by browsers based on standards such as HTTP and TCP/IP.
It was created in 1989 by the UK physicist Tim Berners-Lee
while working at the European Particle Physics Laboratory (called CERN) in
World Wide Web consists of all the public Web sites connected to the
Internet worldwide, including the client devices (such as computers and
cell phones) that access Web content. The WWW is just one of many
applications of the Internet and computer networks.
A broader definition from, the World Wide Web Consortium or W3C (the
organization founded by Tim Berners-Lee): "The World Wide Web is the
universe of network-accessible information, an embodiment of human
knowledge." There are several applications called Web browsers that make
it easy to access the World Wide Web.
The World Web is based on these technologies:
 HTML - Hypertext Markup Language
 HTTP - Hypertext Transfer Protocol
 Web browsers and web servers
HTML stands for HyperText Markup Language also known as the mother
tongue of the browser, the authoring language used to create documents on
the World Wide Web. Hence, it is the publishing language of the World Wide
Web. HTML is similar to SGML (Standard Generalized Markup Language),
although it is not a strict subset.
HTML is a language, which makes it possible to present information (e.g.
scientific research) on the Internet.
Developed by scientist Tim Berners-Lee in 1990, HTML is the "hidden" code
that helps us communicate with others on the World Wide Web (WWW).
The purpose was to make it easier for scientists at different universities to
gain access to each other's research documents. The project became a
bigger success than Tim Berners-Lee had ever imagined. By inventing
HTML he laid the foundation for the web as it is known today.
 Hyper is the opposite of linear. Old-fashioned computer programs
were necessarily linear - that is, they had a specific order. But with a
"hyper" language such as HTML, the user can go anywhere on the web
page at any time.
 Text is just what you're looking at now - English characters used to
make up ordinary words.
 Mark-up- HTML defines the structure and layout of a Web document
by using a variety of tags and attributes. The correct structure for an
HTML document starts with <HTML><HEAD> (enter here what
document is about) <BODY> and ends with </BODY></HTML>. All
the information one would like to include in the Web page fits in
between the <BODY> and </BODY> tags. There are hundreds of other
tags used to format and layout the information in a Web page. Tags
are also used to specify hypertext links. These allow Web developers
to direct users to other Web pages with only a click of the mouse on
either an image or word(s).
 Language is just that. HTML is the language that computers read in
order to understand web pages.
The first version of HTML was described by Tim Berners-Lee in late 1991.
For its first five years (1990-1995), HTML went through a number of
revisions and experienced a number of extensions, primarily hosted first at
CERN, and then at the IETF.
With the creation of the World Wide Web Consortium (W3C), HTML's
development changed venue again. HTML is a formal recommendation by
the W3C and is generally followed to by the major browsers like Microsoft's
Internet Explorer.
A first abortive attempt at extending HTML in 1995 known as HTML 3.0
then made way to a more pragmatic approach known as HTML 3.2, which
was completed in 1997. HTML4 followed, reaching completion in 1998.
The current version of HTML is HTML 4.0. Significant features in HTML 4
are sometimes described in general as dynamic HTML.
What is sometimes referred to as HTML 5 is an extensible form of HTML
called Extensible Hypertext Markup Language (XHTML). It is the newest
specification for HTML, and many browsers are going to start supporting it
in the future.
What we see when we view a page on the Internet is the browser's
interpretation of HTML. To see the HTML code of a page on the Internet,
simply right-click on the browser and choose "View Source Code".
Http is a protocol or a language or a medium in which the information is
passed back and forth between the web servers and the clients.
Http allows transmitting and receiving of information across the internet
If the website is communicating with the browser with http then it is likely
to be communicating with regular unsecure http method and any one can
snoop on the computer’s conversation with the website.
All the user information is contained in the HTTP headers, cookies and
query parameters
Https (= http +‘s’) is a URI Scheme identical in syntax of the http scheme
where ‘s’ stands for “secure”.
It is a simple layering of http over SSL/TSL protocols to protect the traffic,
thus adding security capabilities to the standard http communications. It
provides authentication of the website and the associated web server that
one is dealing with.
It protects the user from Man-in-the-middle-attacks by providing:-
 Bidirectional encryption of information between the client and the
server thus protecting the spying and tampering of the data, or the
forging of communication.
 Ensuring that the communication between the user and the website
is not forged by a third person or an imposter.
HTTPS is especially important over unencrypted networks (such as Wi-
Fi), as anyone on the same local network can "packet sniff" and discover
sensitive information. In addition, many free to use and even paid for WLAN
networks do “packet injection” for serving their own advertisements on
webpages or just for tricks, however this can be exploited maliciously e.g.
by injecting malware and spying on users.
Whenever a website is loaded in http instead of https the use information
and the session gets exposed. Therefore, it becomes mandatory to check for
https before filling up and submission of the information to the server.
Another example where HTTPS is important is over Tor Browser bundles,
connections over Tor (anonymity network), as malicious Tor nodes can
damage or alter the contents passing through them in an insecure manner
and inject malware into the connection. It is only due to the security
reasons Tor project started the development of HTTPS everywhere, which
is now included in the Tor Browser Bundle.
Https signals the browser to use an added encryption layer of SSL/TSL to
protect the traffic.
A client can find out by examining the server’s certificate whether the
server is secure or not.
A Stark contrast between HTTP and HTTPS
 HTTP URLs begin with http://
 operates on port number 80 by default
 it is vulnerable, insecure and is subjected to man-in-middle and
spying attacks
 It is faster than the https. When large amount of data are processed
over a port performance difference is evident
 works on application layer
 HTTPS URLs begin with https://
 use port 443 by default,
 It is secured over the internet connection and is not subjected to
man-in-the-middle attacks as all the information gets encrypted
before being sent to the server.
 https is not a separate protocol but ordinary http over encrypted
SSL/TSL (SSL comes in 2 options- mutual and single)
 Works on the network layer.
The web server has to be prepared to accept https connections.
There is a sophisticated type of man-in-the-middle attack called SSL
stripping attack which was presented at the Blackhat Conference 2009.
This type of attack overthrows the security provided by HTTPS by changing
the https: link into an http: link, taking advantage of the fact that few
internet users actually type "https" into their browser interface: they get to
a secure site by clicking on a link, and thus are deceived into considering
that they are using Secured Http when in fact they are using the normal
HTTP. The attacker then communicates in clear with the client. This
encouraged the development of a countermeasure in HTTP called HTTP
Strict Transport Security.
Web Server
The main function of a web server is to deliver web pages on the request to
clients. This means delivery of HTML documents and any additional content
that may be included by a document, such as images, style sheets and
scripts. Not all Internet servers are part of the World Wide Web.
Web Browser
A software application which is the gateway to the internet, installed on the
computer itself. It is used to locate, retrieve and also display content on
the World Wide Web, including Web pages, images, video and other files. As
a client/server model, the browser is the client running on a computer that
contacts the Web server and requests information. The Web server sends
the information back to the Web browser which displays the results on the
computer or other internet-enabled device that supports a browser.
A web browser communicates with a web server using the http protocol to
download the pages requested by the user, usually by clicking on a
hyperlink. A browser can translate HTML, the language used to create web
pages, into the content displayed in the browser window.
Popular web browsers include Google Chrome, Mozilla Firefox, Opera, and
Internet Explorer.
Web sites and Web browsing exploded in popularity during the mid-1990s.
A URL is an abbreviation or acronym of Uniform Resource Locator (URL.).
It was developed by Tim Berners-Lee in 1994 and the Internet Engineering
Task Force (IETF) URI working group.
It is a reference to documents and other resources on some machine on the
network on the World Wide Web. In other words, it is the global address or
unique address for a file that is accessible on the Internet. It is also
sometimes referred to as a link.
Such a file might be any Web (HTML) page other than the home page, an
image file, or a program such as a common gateway interface application or
Java applet.
It is in the form of formatted text string used by Web browsers, email
clients and other software to identify a network resource on the Internet.
On the Web (which uses the Hypertext Transfer Protocol, or HTTP), an
example of a URL is:
It specifies the use of a HTTP (Web browser) application, a unique
computer named, and the location of a text file or page to
be accessed on that computer whose pathname is /abc/xyz.txt.
A URL for a particular image on a Web site might look like this:
A URL for a file meant to be downloaded using the File Transfer Protocol
(FTP) would require that the "ftp" protocol be specified like this
hypothetical URL:
The example uses the Hypertext Transfer Protocol (HTTP), which is
typically used to serve up hypertext documents.
This is how a computer locates the web page that you are trying to find.
URLs also can point to other resources on the network, such as database
queries and command output. Network resources are files that can be plain
Web pages, other text documents, graphics, or programs.
As stated earlier, a URL is a formatted string which consist of three parts
1. Network protocol
2. Host name or address
3. File or resource location
These substrings are separated by special characters as follows:
protocol :// host / location
URL Protocol
The 'protocol' substring defines a network protocol to be used to access a
resource. These strings are short names followed by the three characters
'://' (a simple naming convention to denote a protocol definition). Typical
URL protocols include http://, ftp://, and mailto://.
It indicates what protocol to be used to fetch the resource that identifies a
specific computer on the Internet,
For example, the two URLs below point to two different files at the domain The first specifies an executable file that should be fetched
using the FTP protocol; the second specifies a Web page that should be
fetched using the HTTP protocol:
URL Host
The 'host' substring identifies a computer or other network device. Hosts
come from standard Internet databases such as DNS and can be names or
can specify the IP address or the domain name where the resource is
located. The resource name is the complete address to the resource. The
format of the resource name depends entirely on the protocol used, but for
many protocols, including HTTP, the resource name contains one or more of
the following components:
 Host Name
The name of the machine on which the resource lives.
 Filename
The pathname to the file on the machine.
 Port Number
The port number to which to connect (typically optional).
 Reference
A reference to a named anchor within a resource that usually
identifies a specific location within a file (typically optional).
URL Location
The 'location' substring contains a path to one specific network resource on
the host. Resources are normally located in a host directory or folder. For
example, /bin/accessibleobject/build-url.htm is the location of a Web page
including two subdirectories and the file name.
When the location element is omitted such as in,
the URL conventionally points to the root directory of the host and often a
home page (like 'index.html').
In simple terms, it is a pathname, a hierarchical description that specifies
the location of a file in that computer.
An example of a URL is: . In this
example URL, is called the domain name. The "index.html"
refers to the specific page.
Note: - The protocol identifier and the resource name are separated by a
colon and two forward slashes.
HTTP is just one of many different protocols used to access different types
of resources on the net. Other protocols include File Transfer Protocol
(FTP), Gopher, File, and News.
For many protocols, the host name and the filename are required, while the
port number and reference are optional. For example, the resource name
for an HTTP URL must specify a server on the network (Host Name) and the
path to the document on that machine (Filename); it also can specify a port
number and a reference.
Absolute vs. Relative URLs
Full URLs featuring all three substrings are called absolute URLs. In some
cases such as within Web pages, URLs can contain only the one location
element. These are called relative URLs. Relative URLs are used for
efficiency by Web servers and a few other programs when they already
know the correct URL protocol and host.
The generic term for all types of names and addresses that refer to objects
on the World Wide Web. The term "Web address" is a synonym for a URL
that uses the HTTP / HTTPS protocol.
A URL is a type of URI Uniform Resource Identifier, formerly called
Universal Resource Identifier.
The URL format is specified in RFC 1738 Uniform Resource Locators (URL).
Institutions on the Web
The chart below refers to the type of institutions that people may come
across while accessing the Internet. The terminal portion of the host name
defines the country in which the host resides. For example, is the web address for the commercial business (.co)
called BBC residing in the United Kingdom (.uk).
.com commercial institution
.biz commercial institution
.net commercial institution
.edu educational institution
.org not-for-profit organization
.gov government institution
.mil military
.int international institution
.info unrestricted use
.museum museums
.name names of individuals
.pro lawyers, accountants, and doctors
.aero aeronautical industry
.coop cooperative organizations
.jobs job advertisements
.mobi mobile-device compatible sites

