Your SlideShare is downloading. ×
0
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Data Mining: Text and web mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Mining: Text and web mining

8,537

Published on

Data Mining: Text and web mining

Data Mining: Text and web mining

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,537
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Text and Web Mining<br />
  • 2. What is Text Mining?<br />Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.<br />Text mining is process of analyzing huge text data to retrieve the information from it.<br />
  • 3. Basic Measures for Text Retrieval<br />Precision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined as<br />Recall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.<br />
  • 4. Retrieval and Indexing<br />Text Retrieval Methods<br /> 1) Document selection methods2) Document ranking methods<br />Text Indexing Techniques<br /> 1) Inverted indices2) Signature files.<br />
  • 5. Query Processing Techniques<br />Once an inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.<br />
  • 6. Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic Indexing<br />Probabilistic Latent Semantic Indexing schemas :<br />1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis<br />
  • 7. Mining WWW<br />Mining World wide web<br />The WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services.<br /> The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.<br />
  • 8. Challenges in mining WWW<br />The Web seems to be too huge for effective data warehousing and data mining<br />The complexity of Web pages is far greater than that of any traditional text document collection<br />The Web is a highly dynamic information source<br />The Web serves a broad diversity of user communities<br />Only a small portion of the information on the Web is truly relevant or useful<br />
  • 9. Web Usage Mining<br />Web usage mining is the third category in web mining. <br />This type of web mining allows for the collection of Web access information for Web pages.<br /> This usage data provides the paths leading to accessed Web pages. <br />This information is often gathered automatically into access logs via the Web server.<br />
  • 10. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

×