• Like
  • Save
Data Mining: Text and web mining
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Data Mining: Text and web mining


Data Mining: Text and web mining

Data Mining: Text and web mining

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Text and Web Mining
  • 2. What is Text Mining?
    Text Data Analysis and Information Retrieval Information retrieval (IR) is a field that has been developing in parallel with database systems for many years.
    Text mining is process of analyzing huge text data to retrieve the information from it.
  • 3. Basic Measures for Text Retrieval
    Precision: This is the percentage of retrieved documents that are in fact relevant tothe query (i.e., “correct” responses). It is formally defined as
    Recall: This is the percentage of documents that are relevant to the query and were,in fact, retrieved.
  • 4. Retrieval and Indexing
    Text Retrieval Methods
    1) Document selection methods2) Document ranking methods
    Text Indexing Techniques
    1) Inverted indices2) Signature files.
  • 5. Query Processing Techniques
    Once an inverted index is created for a document collection, a retrieval system can answer a keyword query quickly by looking up which documents contain the query keywords.
  • 6. Ways of dimensionality Reduction for Text1)Latent Semantic Indexing2) Locality Preserving Indexing3) Probabilistic Latent Semantic Indexing
    Probabilistic Latent Semantic Indexing schemas :
    1) Keyword-Based Association Analysis2) Document Classification Analysis3) Document Clustering Analysis
  • 7. Mining WWW
    Mining World wide web
    The WWW is a huge, widely distributed, global information service center for news, advertisements , management, education, government, and many other information services.
    The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.
  • 8. Challenges in mining WWW
    The Web seems to be too huge for effective data warehousing and data mining
    The complexity of Web pages is far greater than that of any traditional text document collection
    The Web is a highly dynamic information source
    The Web serves a broad diversity of user communities
    Only a small portion of the information on the Web is truly relevant or useful
  • 9. Web Usage Mining
    Web usage mining is the third category in web mining.
    This type of web mining allows for the collection of Web access information for Web pages.
    This usage data provides the paths leading to accessed Web pages.
    This information is often gathered automatically into access logs via the Web server.
  • 10. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at www.dataminingtools.net