MINI PROJECT REPORT
PROJECT NAME :
WEB BROWSER & DOWNLOAD
Abhijeet Kumar Shah
A web browser is a software application for
retrieving, presenting, and traversing information
resources on the World Wide Web.
The World Wide Web (abbreviated as WWW or
W3,commonly known as the Web), is a system of
interlinked hypertext documents accessed via the
Internet. With a web browser, one can view web
pages that may contain text, images, videos, and
other multimedia, and navigate between them via
An information resource is identified by a Uniform
Resource Identifier (URI) and may be a web page,
image, video, or other piece of content. Hyperlinks
present in resources enable users easily to navigate
their browsers to related resources. A web browser
can also be defined as an application software or
program designed to enable users to access, retrieve
and view documents and other resources on
The major web browsers are Firefox, Google Chrome, Internet
Explorer, Opera, and Safari.
The first web browser WorldWideWeb(later renamed Nexus),
was invented in 1990 by Sir Tim Berners-Lee.
In 1992,Robert Cailliau developed the first web browser for the
Macintosh, called Samba.
In 1994, Netscape built the first commercial web browser,
Mozilla 1.0, providing a major driver of the development of the
In 1993,Marc Andreessen invented Mosaic (later Netscape)
,one of the first graphical web browsers and “the world's first
popular browser”. Mosaic introduced support for sound, video
clips, forms support, bookmarks, and history files.
In 1994, the Opera browser was developed by a team of
researchers at a telecommunication company called Telenor
in Oslo, Norway. Opera was first made available on the
Internet in 1996. opera the fast-growing mobile phone web
browser market, being preinstalled on over 40 million phones.
in 1995, Microsoft responded with its Internet Explorer, also
heavily influenced by Mosaic, initiating the industry's
first browser war.
The most recent major entrant to the browser market is
Google's Chrome, first released in September 2008.Chrome‘s
take-up has increased significantly year on year.
Apple's Safari had its first beta release in January 2003; as of
April 2011, it had a dominant share of Apple-based
web browsing, accounting for just over 7% of the entire
The most commonly used browsers are Lynx(1993),
chrome(2008),opera(1995), IE(1995), seamonkey(2005),
Historical Web Browsers
Active Worlds MacWeb
EI*Net NetManage Chameleon
Enhanced NCSA Mosaic PlanetWeb
GetRight Quarterdeck WebC
IBM WebExplorer Spyglass Enhanced Mosaic
internetMCI TueV Mosaic for X
Back and forward buttons to go back to the previous resource and forward
A refresh or reload button to reload the current resource.
A stop button to cancel loading the resource. In some browsers, the stop
button is merged with the reload button.
A home button to return to the user's home page.
An address bar to input the Uniform Resource Identifier(URI) of the desired
resource and display it.
A search bar to input terms into a search engine. In some browsers, the
search bar is merged with the address bar.
A status bar to display progress in loading the resource and also the URI of
links when the cursor hovers over them, and page zooming capability.
The user interface - this includes the address bar, back/forward
button, bookmarking menu etc. Every part of the browser display
except the main window where you see the requested page.
The browser engine - marshalls the actions between the UI and
the rendering engine.
The rendering engine - responsible for displaying the requested
content. For example if the requested content is HTML, it is
responsible for parsing the HTML and CSS and displaying the
parsed content on the screen.
Networking - used for network calls, like HTTP requests. It has
platform independent interface and underneath implementations
for each platform.
UI backend - used for drawing basic widgets like combo boxes and
windows. It exposes a generic interface that is not platform specific.
Underneath it uses the operating system user interface methods.
Data storage. This is a persistence layer. The browser needs to
save all sorts of data on the hard disk, for examples, cookies. The
new HTML specification (HTML5) defines 'web database' which is a
complete (although light) database in the browser.
It is important to note that Chrome, unlike most browsers, holds
multiple instances of the rendering engine - one for each tab. Each
tab is a separate process.
A web browser engine or layout engine or rendering engine, is a
software component that takes marked up content (such as HTML,
XML, image files, etc.) and formatting information (such as CSS,XSL,
etc.) and displays the formatted content on the screen.
the basic flow of the rendering engine
The rendering engine will start parsing the HTML document and turn
the tags to DOM(Document Object Model) nodes in a tree called the
The styling information together with visual instructions in the HTML
will be used to create another tree - the render tree.
Layout process, means giving each node the exact coordinates
where it should appear on the screen.
The next stage is painting- the render tree will be traversed and each
node will be painted using the UI backend layer.
Parsers usually divide the work between two components -
the lexer (tokenizer) that is responsible for breaking the
input into valid tokens, and the parser that is responsible
for constructing the parse tree by analyzing the document structure
according to the language syntax rules.
The parsing process is iterative. The parser will usually
ask the lexer for a new token and try to match the token
with one of the syntax rules. If a rule is matched, a node
corresponding to the token will be added to the parse tree
and the parser will ask for another token.
the "parse tree" is a tree of DOM(Document Object Model) element
and attribute nodes. DOM is the object presentation of the HTML
document and the interface of HTML elements to the outside world
The DOM has an almost one-to-one relation to the markup.
<p>Hello World </p>
<div> <img src="example.png"/></div>
HTML cannot be parsed using the regular top down or
bottom up parsers. The algorithm consists of two stages -
tokenization and tree construction.
Tokenization is the lexical analysis, parsing the input into
tokens. Among HTML tokens are start tags, end tags,
attribute names and attribute values.
The tokenizer recognizes the token, gives it to the tree
constructor, and consumes the next character for recognizing
the next token, and so on until the end of the input.
Basic example - tokenizing the following HTML:
Tree construction algorithm
The input to the tree construction stage is a sequence of tokens from
the tokenization stage.
The first mode is the "initial mode". Receiving the html token will
cause a move to the "before html" mode and a reprocessing of the
token in that mode. This will cause a creation of the
HTMLHtmlElement element and it will be appended to the root
The state will be changed to "before head". We receive the "body"
token. An HTMLHeadElement will be created implicitly although we
don't have a "head" token and it will be added to the tree.
We now move to the "in head" mode and then to "after head". The
body token is reprocessed, an HTMLBodyElement is created and
inserted and the mode is transferred to "in body".
The character tokens of the "Hello world" string are now received.
The first one will cause creation and insertion of a "Text" node and
the other characters will be appended to that node.
The receiving of the body end token will cause a transfer to "after
body" mode. We will now receive the html end tag which will move
us to"after after body" mode. Receiving the end of file token will end
When the renderer is created and added to the tree, it does not have
a position and size. Calculating these values is called layout or
HTML uses a flow based layout model, meaning that most of the time
it is possible to compute the geometry in a single pass. HTML tables
may require more than one pass. Layout can proceed left-to-right,
top-to-bottom through the document.
Layout is a recursive process. It begins at the root renderer, which
corresponds to the <html> element of the HTML document. Layout
computes geometric information for each renderer that requires it.
The position of the root renderer is 0,0 and its dimensions are the
viewport - the visible part of the browser window.
Rendering Engine Used by Browsers
Boxely- for AOL applications
Gecko - for Firefox, Camino, K-Meleon, SeaMonkey, Netscape,
and other Gecko-based browsers.
GtkHTML - for Novell Evolution and other GTK+ programs
HTMLayout - embeddable HTML/CSS rendering engine -
component for Windows and Windows Mobile operating systems
KHTML - for Konqueror
NetFront - for Access NetFront
NetSurf - for NetSurf
Presto- for Opera 7 and above, Macromedia Dreamweaver MX and
MX 2004 (Mac), and Adobe Creative Suite 2.
Prince XML - for Prince XML.
Robin - for The Bat!
Tasman - for Internet Explorer 5 for Mac, Microsoft Office 2004 for
Mac, and Microsoft Office 2008 for Mac.
Trident - for Internet Explorer since version 4.0.
Tkhtml - for hv3
WebKit - for Google Chrome, iOS, Safari, Arora, Midori, OmniWeb,
Shiira, iCab since version 4, Web, SRWare Iron, Rekonq, and
in Maxthon 3.
A download manager is a computer program dedicated to the task
of downloading files from the Internet for storage.
The typical download manager at a minimum provides means to
recover from errors without losing the work already completed, and
can optionally split the file to be downloaded into 2 or more
segments, which are then moved in parallel, potentially making the
process faster within the limits of the available bandwidth.
Multi-source is the name given to files that are downloaded in
Pausing the downloading of large files, and connect again to continue
Downloading files on poor connections, especially for slow networks.
Downloading several files from a site automatically according to
Enable mirror download, that means download the same file from
Scheduled downloads (including, automatic hang-up and shutdown).
Can limit the speed of downloading while remain good stability of
Automatic subfolder generation.
Download Accelerator Plus - Speeds up file downloads and resumes
interrupted downloads. Features include file preview, file shredder
and top downloads list.
FlashGet - Automatically splits files into sections, and downloads
each split simultaneously. Download jobs can be placed in
specifically-named categories for quick access.
Internet Download Accelerator - Integrates with Internet Explorer,
Firefox, Mozilla, Opera, Nescape and others. You can download and
save video from popular video sharing services: YouTube, Google
Video, Metacafe and others.
Internet Download Manager - Accelerate downloads, resume broken or
interrupted downloads, and schedule downloads. The program
features dynamic file segmentation and download logic optimizer to
achieve better download speed and higher Internet connection
TubeTilla Pro - Download YouTube videos and convert them to various
formats like wmv, mp4 and mp3.
Video Get - Downloads video from YouTube and others. Converts video
to variety of video formats.
WebPix - Automatically download pictures from a web site, view them
quickly and browse thumbnails in an instant
Download manager support different protocol like-
The Hypertext Transfer Protocol (HTTP) is an application
protocol for distributed, collaborative, hypermedia information
Hypertext Transfer Protocol Secure (HTTPS) is a widely-
used communications protocol for secure communication over
a computer network, with especially wide deployment on the Internet.
File Transfer Protocol (FTP) is a standard network protocol used to
transfer files from one host to another host over a TCP-based
network, such as the Internet.
Microsoft Media Server (MMS) is the name of
Microsoft's proprietary network streaming protocol used to
transfer unicast data in Windows Media Services (previously
called NetShow Services). MMS can be transported
via UDP or TCP. The MMS default port is UDP/TCP 1755.
The Real Time Streaming Protocol (RTSP) is a network
control protocol designed for use in entertainment and
communications systems to control streaming media servers. The
protocol is used for establishing and controlling media sessions
between end points.
Magnet links, which mainly refer to resources available for download
via peer-to-peer networks.
Real Time Messaging Protocol (RTMP) was initially a proprietary
protocol developed by Macromedia for streaming audio, video and
data over the Internet, between a Flash player and a server.
BitTorrent is a peer-to-peer file sharing protocol used for distributing
large amounts of data over the Internet
For dial-up users, they can automatically dial the Internet Service
Provider at night, when rates or tariffs are usually much lower,
download the specified files, and hang-up. They can record which
links the user clicks on during the day, and queue these files for later
For broadband users, download managers can help download very
large files by resuming broken downloads, by limiting
the bandwidth used, so that other internet activities are not affected
(slowed) and the server is not overloaded, or by automatically
navigating a site and downloading pre-specified content (photo
galleries, MP3 collections, etc.).