Your SlideShare is downloading. ×
0
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
stavies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

stavies

657

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
657
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BYK.RAJASEKHAR REDDY (08Q61A0528)
  • 2. Contents: Introduction Wrappers Clustering System Description Working Types Advantages and Disadvantages Conclusion
  • 3. Introduction:STAVIES is a system for Information Extraction through Automatic Web Wrapper Using clustering Techniques.
  • 4. STAVIES is used in: Automatic Information Discovery. Extraction of structured web data.
  • 5. WRAPPERS Piece of software to extract the useful information from web data sources. Data extracted is referred as Structural Tokens.
  • 6. Categories of Wrappers: Site Specific: Extracts information from a web pages or family of web pages. Generic wrappers: Can be applied to almost any page regardless of the structures.
  • 7. CLUSTERINGProcess of recognizing input data set in such a way that data points in same cluster are similar other than in different clusters.
  • 8. Quality Evaluation Measures: Cluster Compactness: Evaluates how the subsets of input are redistributed by clustering system, compared with whole input set. Cluster Separation: Indicates overall dissimilarity among the output clusters.
  • 9. System Description Two modules 1.Transformation module 2.Extraction module
  • 10. Phases: Preparation Phase: 1.Validation correction and XHTML generation. 2.Tree transformation and Terminal node selecton
  • 11. • Segmentation Phase: 1. Nodes Comparison. 2. Hierarchical clustering. 3. Cluster Evaluation and Target area Discover. 4. Boundary selection.
  • 12. • Information Retrieval Phase: 1. Information Extraction component.
  • 13. Working:
  • 14. Experimental Results:
  • 15. Types: OMINI MDR
  • 16. Advantages: Executes in less than 0.4 sec. No human assistance is required. High performance.
  • 17. Disadvantage: Hard to implement in free texts and non-template pages.
  • 18. Conclusion STAVIES saves precious time and effort. Tested successfully in more than 63,000 HTML pages from 50 different web data sources.
  • 19. THANK YOU.
  • 20. Queries????

×