3. Introduction:
STAVIES is a system for
Information Extraction through
Automatic Web Wrapper Using
clustering Techniques.
4. STAVIES is used in:
Automatic Information Discovery.
Extraction of structured web data.
5. WRAPPERS
Piece of software to extract the
useful information from web data
sources.
Data extracted is referred as Structural
Tokens.
6. Categories of Wrappers:
Site Specific:
Extracts information from a web
pages
or family of web pages.
Generic wrappers:
Can be applied to almost any page
regardless of the structures.
7. CLUSTERING
Process of recognizing input data
set in such a way that data points in
same cluster are similar other than
in different clusters.
8. Quality Evaluation Measures:
Cluster Compactness:
Evaluates how the subsets of input are redistributed
by clustering system, compared with whole input set.
Cluster Separation:
Indicates overall dissimilarity among the output
clusters.