By: Shireen Fatima ()
Guide: Dr. Siddhartha Ghosh
Web Mining :Accomplishments & Future Directions
by Jaideep Srivastava
Mining the Web: Discovering Knowledge from
Hypertext Data by Soumen Chakrabati
Web Mining today and tomorrow by Kavita Sharma
and Vikas Kumar
Web Mining taxonomy
Solution to Search Engine Problem
Web Mining through cloud computing
Data mining: turn data into knowledge.
Web mining is the application of data mining
techniques to find interesting and potentially
useful knowledge from web data.
Web data is
Web content –text,image,records,etc.
Web structure –hyperlinks,tags,etc.
Web usage –http logs,app server logs,etc.
Personalized customer experience in ecommerce - Amazon.com
Web Search- Google
Web wide tracking - Double Click
Understanding Web communities- AOL
Understanding auction behavior - eBay
Personalized Portal for the Web - My Yahoo
Information filtering techniques try to learn
about users’ interests based on their evaluation
and actions, and then to use this information to
analyze new documents.
It Increase the value of each visitor. Improve the
visitor’s experience at the websites.
Web mining is attractive for companies, because
of several advantages.In the most general sense
it can contribute to the increase of profit.
Information is Huge.
Information is diverse.
Information is redundant
Discovery of useful information from web
contents / data / documents
The data mining techniques applied are:
-A source of textual
e.g., how many words
are common in these
Several clusters of documents
that are relevant to each other
X =====> Y
where X,Y are sets of items,
confidence or P(X v Y),
support or P(X^Y)
Classification: is the task of generalizing known
structure to apply to new data.
For example, an e-mail program might attempt to
classify an e-mail as "legitimate" or as "spam".
The structure of a typical Web graph consists of Web
pages as nodes, and hyperlinks as edges connecting
between two related pages.
Web Structure Mining is the process of discovering
structure information from the Web.
Web-graph: A directed graph that represents the Web.
Node: Each Web page is a node of the Web-graph.
Link: Each hyperlink on the Web is a directed edge of
It deals with understanding user behavior in
interacting with the web or with a website.
To obtain information that may assist web
sites for reorganization or adaptation to
better suit the user.
Clustering and Classification
clients who often access /products/software/webminer.html
tend to be from educational institutions.
clients who placed an online order for software tend to be
students in the 20-25 age group
75% of clients who download software from
/products/software/demos/ visit between 7:00 and 11:00
pm on weekends
Sequential patterns - A set of items is followed by
another item in time-order
Web usage examples
30% of clients who visited /products/software/, had done a
search in Yahoo using the keyword “software” before their
60% of clients who placed an online order for WEBMINER,
placed another online order for software within 15 days
As the search engines use enormous information
existing in the web sites, web pages, it is a
challenging task to engineer, implement and to
improvise the search engine.
It helps in problems of how to effectively deal with
uncontrolled hypertext collection where anyone can
publish anything they want.
Web Mining Applications have been used by the
web sites such as Web search e.g., Google and
Yahoo ,Web Recommendations e.g., Amazon.com ,
Web Advertising e.g., Google and Yahoo.
Web site design e.g., landing page optimization
Cloud Computing is clearly one of today's most
seductive technology areas due at least in part to its
cost efficiency and flexibility.
Cloud Mining is new approach to faced search
interface for your data. SaS (Software-as-a-Service)
is used for reducing the cost of web mining and try
to provide security that become with cloud mining
Web Mining fills the information gap between web users
and web designers
Many successful techniques have been developed for the
mining the web
Cloud mining is the improvised method for web mining
The need for discovering new methods and techniques to
handle the amounts of data existing in this universe will