Earlier tools do not support extracting from complex objects, like RAPIER, SRV, WHISK, and WIEN.
Semistructured data (table type, richly tagged)
Semistructured text (text type, rarely tagged)
NLP-based tools: text type only
Other tools (except ontology-based): table type only
BYU ontology: both types
Ease of Use
HTML-aware tools, easiest to use
Wrapper languages, hardest to use
Other tools, in the middle
XML is the best output format for data sharing on the Web.
Support for Non-HTML Sources
NLP-based and ontology-based, automatically support
Other tools, may support but need additional helper like syntactical and semantic analyzer
Resilience and Adaptiveness
Resilience: continuing to work properly in the occurrence of changes in the target pages
Adaptiveness: working properly with pages from some other sources but in the same application domain
Only BYU ontology has both the features.
Summary of Qualitative Analysis
Graphical Perspective of Qualitative Analysis
X means the information extraction system has the capability; X* means the information extraction system has the ability as long as the training corpus can accommodate the required training data; ? Shows that the systems can has the ability in somewhat degree; * means that the extraction pattern itself doesn’t show the ability, but the overall system has the capability. Nested_ data Free Resilient Permuta_tions Missing items Multi-slot Single-slot Semi Struc_ ture Name X X X ? X X ? ? X X X X ROAD_ RUNNER X X X AutoSlog X X X X X X X BYU Onto ? X* X X X X X WHISK ? X X X X X SRV ? X X X X X RAPIER X X * X X X STALKER X* X X X X X SoftMealy X X X WIEN