Data Mining: Text and web mining


Published on

Data Mining: Text and web mining

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Mining: Text and web mining

  1. 1. Text and Web Mining<br />
  2. 2. What is Text Mining?<br />Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.<br />Text mining is process of analyzing huge text data to retrieve the information from it.<br />
  3. 3. Basic Measures for Text Retrieval<br />Precision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined as<br />Recall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.<br />
  4. 4. Retrieval and Indexing<br />Text Retrieval Methods<br /> 1) Document selection methods2) Document ranking methods<br />Text Indexing Techniques<br /> 1) Inverted indices2) Signature files.<br />
  5. 5. Query Processing Techniques<br />Once an inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.<br />
  6. 6. Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic Indexing<br />Probabilistic Latent Semantic Indexing schemas :<br />1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis<br />
  7. 7. Mining WWW<br />Mining World wide web<br />The WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services.<br /> The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.<br />
  8. 8. Challenges in mining WWW<br />The Web seems to be too huge for effective data warehousing and data mining<br />The complexity of Web pages is far greater than that of any traditional text document collection<br />The Web is a highly dynamic information source<br />The Web serves a broad diversity of user communities<br />Only a small portion of the information on the Web is truly relevant or useful<br />
  9. 9. Web Usage Mining<br />Web usage mining is the third category in web mining. <br />This type of web mining allows for the collection of Web access information for Web pages.<br /> This usage data provides the paths leading to accessed Web pages. <br />This information is often gathered automatically into access logs via the Web server.<br />
  10. 10. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at<br />