Web mining


Published on

This is a presentation about web Mining. Hope it will help you in your research area

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • hhhhh
  • Web mining

    1. 1. Priyabrata Satapathy
    2. 2.  Mining refers to extract something from any where. Anand Bihari 2
    3. 3.  Data mining refers to extracting or “mining” knowledge from large amounts of data. Mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, data mining should have been more appropriately named “knowledge mining from data,” which is unfortunately somewhat long. “Knowledge mining,” a shorter term, may not reflect the emphasis on mining from large amounts of data. Anand Bihari 3
    4. 4.  Web is a collection of inter-related files on one or more Web servers. Huge : Over 1 billion pages, 15 terabytes. Wealth of information : Presence everywhere. Highly Dynamic : Sites registered, closed . Structure : Graph structure with links between pages. Access : Hundreds of millions of requests per day. Anand Bihari 4
    5. 5.  Web mining is the application of data mining techniques to extract knowledge from Web data, including Web documents, hyperlinks between documents, usage logs of web sites, etc. Web Data :  Web content : text, image, record ,etc.  Web structure : hyperlinks, tag, etc.  Web usage : http logs, app server logs, etc. Anand Bihari 5
    6. 6.  Traditional data mining Data is structured and relational. Well-defined tables, columns, rows, keys, and constraints. Web data Semi-structured and unstructured. Readily available data. Rich in features and patterns. Anand Bihari 6
    7. 7.  E-commerce  User profiles.  Targeted advertising. Network Management  Performance management.  Fault management. Information retrieval (Search) on the Web Anand Bihari 7
    8. 8. Web Mining StructureContent Mining Usage Mining Mining Document Text Hyperlink Web Server Log Structure Inter Document Application Image Hyperlink Sever Log Intra Document Application Video Hyperlink Level Log Audio Structure Record Anand Bihari 8
    9. 9.  The structure of a typical Web graph consists of Web pages as nodes, and hyperlinks as edges connecting between two related pages. Web Structure Mining can be is the process of discovering structure information from the Web . This type of mining can be performed either at the (intra-page) document level or at the (inter-page) hyperlink level. Anand Bihari 10
    10. 10.  Web-graph : A directed graph that represents the Web. Node : Each Web page is a node of the Web-graph. Link : Each hyperlink on the Web is a directed edge of the Web- graph. In-degree :The in-degree of a node, p , is the number of distinct links that point to p. Out-degree : The out-degree of a node, p, is the number of distinct links originating at p that point to other nodes. Anand Bihari 11
    11. 11.  Directed Path : A sequence of links, starting from p that can be followed to reach q. Shortest Path: Of all the paths between nodes p and q, which has the shortest length, i.e. number of links on it. Diameter : The maximum of all the shortest paths between a pair of nodes p and q, for all pairs of nodes p and q in the Web-graph. Anand Bihari 12
    12. 12.  Literature Survey Titles Name of Publication Journal/Conferences Year Mining web informative structures IEEE Transactions On Knowledge 2004 and Contents based on entropy And Data Engineering analysis Wisdom: web intra page IEEE Transactions On Knowledge 2005 informative structure Mining based And Data Engineering on document object model Knowledge Discovery and Retrieval 2010 Fourth Asia International 2010 on World Wide Web Conference on Using Web Structure Mining Mathematical/Analytical Modelling and Computer Simulation Design and implementation of a International Conference on 2011 web structure Mining algorithm internet technology and secured using breadth first search Strategy transactions for academic search application Anand Bihari 13
    13. 13.  Problem Identification After studying these Journals and conference paper, we will find the problem and go with this problem. Anand Bihari 14
    14. 14. Anand Bihari 15