2. Trainings by Vidya Bhagwat
Websites:
A website is hosted on at least one web server, accessible
via a network such as the Internet or a private local area network
through an Internet address known as a Uniform resource
locator. All publicly accessible websites collectively constitute
the.
3. Importance of websites:
• Internet marketing comes
of age
• Internet marketing is now a
major, multi-billion dollar
industry.
• Despite some concerns,
many consumers now have
the skills and the
confidence to transact
purchases using the web.
4. Trainings by Vidya Bhagwat
• Local business is affected as well
• Many small business operators have been disappointed with
the results achieved by their websites.
• Sites have been created but few if any business has resulted.
• There are a number of reasons:
• unrealistic expectations;
• poor website construction (not search engine friendly);
• poor targeting.
• Local search is growing in importance. Local search is the
ability to search for and find businesses and organizations in
the local area, that is, in close proximity geographically.
• This will vary from business to business.
5. Trainings by Vidya Bhagwat
Website structure understanding:
• Website Structure Understanding and its Applications
• Website structure understanding can be treated as a reverse
engineering for the purpose of automatically discovering the
layout templates and URL patterns of a website, and
understanding how these templates and patterns are
integrated to organize the website. The study of this problem
has had a great impact to many applications which can
leverage such site-level knowledge to help web search and
data mining.
7. Trainings by Vidya Bhagwat
• What’s Website Structure?
• In this project, the website structure consists of three
components: layout templates, URL patterns, and linkage
structure.
• Layout Template:
• Most web pages consist of HTML elements like table, menu,
button, image, and input box. The layout of a web page
describes what HTML elements are included in the page, as
well as how these elements are visually distributed in page
rendering. Essentially, a page layout is represented by a so
called DOM (Document Object Model) tree. In this project, a
layout template is considered as a group of pages which have
very similar layouts (DOM trees).
8. Trainings by Vidya Bhagwat
• Link Structure
• Based on the layout templates and URL patterns, we can
construct a directed graph to represent the website
organization structure. That is, each layout template is
considered as a node in a graph, and two nodes are linked if
there are hyperlinks between the pages belonging to the two
nodes. The link direction is the same as the related
hyperlinks. And each link is characterized with the URL pattern
of the corresponding hyperlink URLs. Again, it should be
noticed that there could be multiple links from one node to
another if the corresponding hyperlinks have more than one
URL pattern.
• Fig. 2 gives an illustrative example of the sub-graph
constructed based on the layout templates and URL patterns
above.
9. Trainings by Vidya Bhagwat
• Random Sampling
• The goal of random sampling is to provide a snapshot of a
website by downloading only a relatively small number of
pages. The sampling quality is the foundation of the whole
mining process. To keep the downloaded pages as diverse as
possible, in practice the sampling process adopts a
strategy combining both breadth-first and depth-first, and can
quickly retrieve pages at deep levels within a few steps.
10. Trainings by Vidya Bhagwat
• Inspired by this observation, in this project, DOM path is
utilized to characterize the layout of a webpage. As shown in
Fig. 5, a DOM path is a path from a leaf node to the root of
the DOM tree. The leaf node indicates the component type,
and the path-to-root approximately describes the visual
location of that component in page rendering.
• Given a set of HTML pages, all unique DOM paths are
extracted to form a feature space. Each page is represented
by a point in the feature space, and the layout similarity of
two pages can be estimated. A bottom-up strategy is then
utilized to group similar pages, and each cluster is considered
as a layout template.
11. Trainings by Vidya Bhagwat
• URL Pattern Discovery
• A URL is not an ordinary string but has a syntax structure
scheme strictly defined by W3C standards. Based on a syntax
structure, a URL string can be represented by a group of key-
value pairs. Fig. 6 gives an example URL, its syntax structure,
and the corresponding key-value pairs.
It is noticed that different URL components (or keys) usually
have different functions and play different roles in a website.
In general, keys denoting directories, functions, and
document types are with only a few values, which should be
explicitly recorded in a URL pattern. By contrast, keys
denoting parameters such as user names are with quite
diverse values, which should be generalized in the pattern.
12. Trainings by Vidya Bhagwat
• It is noticed that different URL components (or keys) usually
have different functions and play different roles in a website.
In general, keys denoting directories, functions, and
document types are with only a few values, which should be
explicitly recorded in a URL pattern. By contrast, keys
denoting parameters such as user names are with quite
diverse values, which should be generalized in the
pattern. Based on this observation, a top-down recursive split
process is proposed in this project to construct a pattern tree
to characterize a set of URLs. Fig. 7 gives an example pattern
tree based on URLs from www.wretch.cc. Algorithm details
please refer to.
13. Trainings by Vidya Bhagwat
• Website Designing India have assisted hundreds of businesses
to build or update a website custom to their requirements.
You get more than just a website with our Website Designing
Services. You can update your website content easily, take
credit card payments online, and use lots of tools like poll
managers, news managers, photo galleries, and form builders.
Whether you're looking for an ecommerce web design
company or a web development company that showcases
your business, our website designing & development services
give you control over your site with no technical skills needed.
14. Trainings by Vidya Bhagwat
Domain name:
• This article is about domain names in the Internet. For other
uses, see Domain.
• A domain name is a unique name that identifies a website. It
is an identification string that defines a realm of
administrative autonomy, authority or control on the Internet.
Domain names are formed by the rules Domain Name System
(DNS). Any name registered in the DNS is a domain name. The
functional description of domain names is presented in the
Domain Name System article. Broader usage and industry
aspects are captured here.
15. Trainings by Vidya Bhagwat
• Domain names are used in various networking contexts and
application-specific naming and addressing purposes. In
general, a domain name represents an Internet Protocol (IP)
resource, such as a personal computer used to access the
Internet, a server computer hosting a web site, or the web
site itself or any other service communicated via the Internet.
In 2010, the number of active domains reached 196 million.
16. Trainings by Vidya Bhagwat
Use in web site hosting
• The domain name is a component of a Uniform Resource
Locator (URL) used to access web sites, for example:
• URL: http://www.example.net/index.html
• Top-level domain name: net
• Second-level domain name: example.net
17. Trainings by Vidya Bhagwat
• Host name: www.example.net
• A domain name may point to multiple IP addresses in order to
provide server redundancy for the cybernetic services to be
delivered; such multi-address capability is used to manage the
traffic of large, popular web sites. More commonly, however,
one server computer, at a given IP address, may also host web
sites in different domains. Such address overloading enables
virtual web hosting, commonly used by large web hosting
services to conserve IP address space. IP-address overloading
is possible through a feature in the HTTP version 1.1 protocol,
but not in the HTTP version 1.0 protocol, which requires that a
request identify the domain name being referred for
connection.
18. Trainings by Vidya Bhagwat
Contact Information
• To obtain further information about any of our databases,
services, or programs, contact NCBI:
Pub Med Customer Service:
• Send an Email for help with technical issues, searching, or
content assistance
• Call 1-888-FIND-NLM (1-888-346-3656) for help with
searching or content assistance only
• General Information: info@ncbi.nlm.nih.gov
• Questions about and technical support for NCBI and its
programs and services
• BLAST: blast-help@ncbi.nlm.nih.gov
• Technical questions on running or interperting BLAST
sequence comparison searches