By: Shireen Fatima ()
Guide: Dr. Siddhartha Ghosh


Web Mining :Accomplishments & Future Directions
by Jaideep Srivastava



Mining the Web: Discovering Knowledge from
Hypertext Data by Soumen Chakrabati



Web Mining today and tomorrow by Kavita Sharma
and Vikas Kumar








Introduction
Applications
Challenges
Web Mining taxonomy
Solution to Search Engine Problem
Web Mining through cloud computing
Conclusion


Data mining: turn data into knowledge.



Web mining is the application of data mining
techniques to find interesting and potentially
useful knowledge from web data.
Web data is


Web content –text,image,records,etc.



Web structure –hyperlinks,tags,etc.



Web usage –http logs,app server logs,etc.


Personalized customer experience in ecommerce - Amazon.com



Web Search- Google



Web wide tracking - Double Click



Understanding Web communities- AOL



Understanding auction behavior - eBay



Personalized Portal for the Web - My Yahoo


Information filtering techniques try to learn
about users’ interests based on their evaluation
and actions, and then to use this information to
analyze new documents.



It Increase the value of each visitor. Improve the
visitor’s experience at the websites.



Web mining is attractive for companies, because
of several advantages.In the most general sense
it can contribute to the increase of profit.
Information is Huge.
Information is diverse.
 Information is redundant




Discovery of useful information from web
contents / data / documents



The data mining techniques applied are:
Classification
Clustering
Associations






Given:
-A source of textual
documents.
-Similarity measure
e.g., how many words
are common in these
documents?

•

Find:

Several clusters of documents
that are relevant to each other


Association Rules:
discovers

similarity
transactions

among

X =====> Y
where X,Y are sets of items,

sets

of

items

across

confidence or P(X v Y),

support or P(X^Y)
 Classification: is the task of generalizing known
structure to apply to new data.
 For example, an e-mail program might attempt to
classify an e-mail as "legitimate" or as "spam".








The structure of a typical Web graph consists of Web
pages as nodes, and hyperlinks as edges connecting
between two related pages.
Web Structure Mining is the process of discovering
structure information from the Web.
Web-graph: A directed graph that represents the Web.
‰
Node: Each Web page is a node of the Web-graph.
‰
Link: Each hyperlink on the Web is a directed edge of
the Web-graph


It deals with understanding user behavior in
interacting with the web or with a website.



To obtain information that may assist web
sites for reorganization or adaptation to
better suit the user.


Clustering and Classification
 clients who often access /products/software/webminer.html

tend to be from educational institutions.
 clients who placed an online order for software tend to be
students in the 20-25 age group
 75% of clients who download software from
/products/software/demos/ visit between 7:00 and 11:00
pm on weekends
Sequential patterns - A set of items is followed by
another item in time-order

Web usage examples
30% of clients who visited /products/software/, had done a

search in Yahoo using the keyword “software” before their
visit
60% of clients who placed an online order for WEBMINER,
placed another online order for software within 15 days


As the search engines use enormous information
existing in the web sites, web pages, it is a
challenging task to engineer, implement and to
improvise the search engine.



It helps in problems of how to effectively deal with
uncontrolled hypertext collection where anyone can
publish anything they want.


Web Mining Applications have been used by the
web sites such as Web search e.g., Google and
Yahoo ,Web Recommendations e.g., Amazon.com ,
Web Advertising e.g., Google and Yahoo.



Web site design e.g., landing page optimization


Cloud Computing is clearly one of today's most
seductive technology areas due at least in part to its
cost efficiency and flexibility.



Cloud Mining is new approach to faced search
interface for your data. SaS (Software-as-a-Service)
is used for reducing the cost of web mining and try
to provide security that become with cloud mining
technique.


Web Mining fills the information gap between web users
and web designers



Many successful techniques have been developed for the
mining the web



Cloud mining is the improvised method for web mining



The need for discovering new methods and techniques to
handle the amounts of data existing in this universe will
always exist.
Web mining

Web mining

  • 1.
    By: Shireen Fatima() Guide: Dr. Siddhartha Ghosh
  • 2.
     Web Mining :Accomplishments& Future Directions by Jaideep Srivastava  Mining the Web: Discovering Knowledge from Hypertext Data by Soumen Chakrabati  Web Mining today and tomorrow by Kavita Sharma and Vikas Kumar
  • 3.
           Introduction Applications Challenges Web Mining taxonomy Solutionto Search Engine Problem Web Mining through cloud computing Conclusion
  • 4.
     Data mining: turndata into knowledge.  Web mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data.
  • 5.
    Web data is  Webcontent –text,image,records,etc.  Web structure –hyperlinks,tags,etc.  Web usage –http logs,app server logs,etc.
  • 6.
     Personalized customer experiencein ecommerce - Amazon.com  Web Search- Google  Web wide tracking - Double Click  Understanding Web communities- AOL  Understanding auction behavior - eBay  Personalized Portal for the Web - My Yahoo
  • 7.
     Information filtering techniquestry to learn about users’ interests based on their evaluation and actions, and then to use this information to analyze new documents.  It Increase the value of each visitor. Improve the visitor’s experience at the websites.  Web mining is attractive for companies, because of several advantages.In the most general sense it can contribute to the increase of profit.
  • 8.
    Information is Huge. Informationis diverse.  Information is redundant  
  • 10.
     Discovery of usefulinformation from web contents / data / documents  The data mining techniques applied are: Classification Clustering Associations   
  • 11.
     Given: -A source oftextual documents. -Similarity measure e.g., how many words are common in these documents? • Find: Several clusters of documents that are relevant to each other
  • 12.
     Association Rules: discovers similarity transactions among X =====>Y where X,Y are sets of items, sets of items across confidence or P(X v Y), support or P(X^Y)  Classification: is the task of generalizing known structure to apply to new data.  For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".
  • 13.
         The structure ofa typical Web graph consists of Web pages as nodes, and hyperlinks as edges connecting between two related pages. Web Structure Mining is the process of discovering structure information from the Web. Web-graph: A directed graph that represents the Web. ‰ Node: Each Web page is a node of the Web-graph. ‰ Link: Each hyperlink on the Web is a directed edge of the Web-graph
  • 14.
     It deals withunderstanding user behavior in interacting with the web or with a website.  To obtain information that may assist web sites for reorganization or adaptation to better suit the user.
  • 16.
     Clustering and Classification clients who often access /products/software/webminer.html tend to be from educational institutions.  clients who placed an online order for software tend to be students in the 20-25 age group  75% of clients who download software from /products/software/demos/ visit between 7:00 and 11:00 pm on weekends
  • 17.
    Sequential patterns -A set of items is followed by another item in time-order Web usage examples 30% of clients who visited /products/software/, had done a search in Yahoo using the keyword “software” before their visit 60% of clients who placed an online order for WEBMINER, placed another online order for software within 15 days
  • 18.
     As the searchengines use enormous information existing in the web sites, web pages, it is a challenging task to engineer, implement and to improvise the search engine.  It helps in problems of how to effectively deal with uncontrolled hypertext collection where anyone can publish anything they want.
  • 19.
     Web Mining Applicationshave been used by the web sites such as Web search e.g., Google and Yahoo ,Web Recommendations e.g., Amazon.com , Web Advertising e.g., Google and Yahoo.  Web site design e.g., landing page optimization
  • 20.
     Cloud Computing isclearly one of today's most seductive technology areas due at least in part to its cost efficiency and flexibility.  Cloud Mining is new approach to faced search interface for your data. SaS (Software-as-a-Service) is used for reducing the cost of web mining and try to provide security that become with cloud mining technique.
  • 21.
     Web Mining fillsthe information gap between web users and web designers  Many successful techniques have been developed for the mining the web  Cloud mining is the improvised method for web mining  The need for discovering new methods and techniques to handle the amounts of data existing in this universe will always exist.