Your SlideShare is downloading. ×
Text and Web Mining<br />
What is Text Mining?<br />Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been...
Basic Measures for Text Retrieval<br />Precision: This is the percentage of retrieved documents that are in fact relevant ...
Retrieval and Indexing<br />Text Retrieval Methods<br />    1) Document selection methods2) Document ranking methods<br />...
Query Processing Techniques<br />Once an inverted index is created for a document collection, a retrieval system can answe...
Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent ...
Mining  WWW<br />Mining World wide web<br />The WWW is a huge, widely distributed, global information service center for n...
Challenges in mining WWW<br />The Web seems to be too huge for effective data warehousing and data mining<br />The complex...
Web Usage Mining<br />Web usage mining is the third category in web mining. <br />This type of web mining allows for the c...
Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutori...
Upcoming SlideShare
Loading in...5
×

Data Mining: Text and web mining

8,673

Published on

Data Mining: Text and web mining

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,673
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Data Mining: Text and web mining"

  1. 1. Text and Web Mining<br />
  2. 2. What is Text Mining?<br />Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.<br />Text mining is process of analyzing huge text data to retrieve the information from it.<br />
  3. 3. Basic Measures for Text Retrieval<br />Precision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined as<br />Recall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.<br />
  4. 4. Retrieval and Indexing<br />Text Retrieval Methods<br /> 1) Document selection methods2) Document ranking methods<br />Text Indexing Techniques<br /> 1) Inverted indices2) Signature files.<br />
  5. 5. Query Processing Techniques<br />Once an inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.<br />
  6. 6. Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic Indexing<br />Probabilistic Latent Semantic Indexing schemas :<br />1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis<br />
  7. 7. Mining WWW<br />Mining World wide web<br />The WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services.<br /> The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.<br />
  8. 8. Challenges in mining WWW<br />The Web seems to be too huge for effective data warehousing and data mining<br />The complexity of Web pages is far greater than that of any traditional text document collection<br />The Web is a highly dynamic information source<br />The Web serves a broad diversity of user communities<br />Only a small portion of the information on the Web is truly relevant or useful<br />
  9. 9. Web Usage Mining<br />Web usage mining is the third category in web mining. <br />This type of web mining allows for the collection of Web access information for Web pages.<br /> This usage data provides the paths leading to accessed Web pages. <br />This information is often gathered automatically into access logs via the Web server.<br />
  10. 10. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

×