Your SlideShare is downloading. ×
Data Mining: Text and web mining
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Data Mining: Text and web mining


Published on

Data Mining: Text and web mining

Data Mining: Text and web mining

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Text and Web Mining
  • 2. What is Text Mining?
    Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.
    Text mining is process of analyzing huge text data to retrieve the information from it.
  • 3. Basic Measures for Text Retrieval
    Precision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined as
    Recall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.
  • 4. Retrieval and Indexing
    Text Retrieval Methods
    1) Document selection methods2) Document ranking methods
    Text Indexing Techniques
    1) Inverted indices2) Signature files.
  • 5. Query Processing Techniques
    Once an inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.
  • 6. Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic Indexing
    Probabilistic Latent Semantic Indexing schemas :
    1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis
  • 7. Mining WWW
    Mining World wide web
    The WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services.
    The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.
  • 8. Challenges in mining WWW
    The Web seems to be too huge for effective data warehousing and data mining
    The complexity of Web pages is far greater than that of any traditional text document collection
    The Web is a highly dynamic information source
    The Web serves a broad diversity of user communities
    Only a small portion of the information on the Web is truly relevant or useful
  • 9. Web Usage Mining
    Web usage mining is the third category in web mining.
    This type of web mining allows for the collection of Web access information for Web pages.
    This usage data provides the paths leading to accessed Web pages.
    This information is often gathered automatically into access logs via the Web server.
  • 10. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at