stavies

897 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
897
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

stavies

  1. 1. BYK.RAJASEKHAR REDDY (08Q61A0528)
  2. 2. Contents: Introduction Wrappers Clustering System Description Working Types Advantages and Disadvantages Conclusion
  3. 3. Introduction:STAVIES is a system for Information Extraction through Automatic Web Wrapper Using clustering Techniques.
  4. 4. STAVIES is used in: Automatic Information Discovery. Extraction of structured web data.
  5. 5. WRAPPERS Piece of software to extract the useful information from web data sources. Data extracted is referred as Structural Tokens.
  6. 6. Categories of Wrappers: Site Specific: Extracts information from a web pages or family of web pages. Generic wrappers: Can be applied to almost any page regardless of the structures.
  7. 7. CLUSTERINGProcess of recognizing input data set in such a way that data points in same cluster are similar other than in different clusters.
  8. 8. Quality Evaluation Measures: Cluster Compactness: Evaluates how the subsets of input are redistributed by clustering system, compared with whole input set. Cluster Separation: Indicates overall dissimilarity among the output clusters.
  9. 9. System Description Two modules 1.Transformation module 2.Extraction module
  10. 10. Phases: Preparation Phase: 1.Validation correction and XHTML generation. 2.Tree transformation and Terminal node selecton
  11. 11. • Segmentation Phase: 1. Nodes Comparison. 2. Hierarchical clustering. 3. Cluster Evaluation and Target area Discover. 4. Boundary selection.
  12. 12. • Information Retrieval Phase: 1. Information Extraction component.
  13. 13. Working:
  14. 14. Experimental Results:
  15. 15. Types: OMINI MDR
  16. 16. Advantages: Executes in less than 0.4 sec. No human assistance is required. High performance.
  17. 17. Disadvantage: Hard to implement in free texts and non-template pages.
  18. 18. Conclusion STAVIES saves precious time and effort. Tested successfully in more than 63,000 HTML pages from 50 different web data sources.
  19. 19. THANK YOU.
  20. 20. Queries????

×