Advertisement
Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems!
Upcoming SlideShare
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking an analysis of current benchmark datasets and a ro...
Loading in ... 3
1 of 1
Advertisement

More Related Content

Advertisement

Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems!

  1. Rule-based Information Extraction is DEAD Long Live Rule-based Information Extraction Systems! Laura Chiticariu, Yunyao Li, Frederick Reiss IBM Research - Almaden THE DISCONNECT: ACADEMIC vs. INDUSTRY Implementations of Entity Extraction Entity Extraction Papers by Year 3.5% 21% 100% RuleBased Hybrid 45% 50% RuleBased 22% 75% 17% Hybrid 17% Machine Learning Based 33% 0% NLP Papers (2003-2012) All Vendors Large Vendors Machine Learning Based Fraction of NLP Papers 67% Commercial Products (2013) Year of Publication THE EXPLANATIONS Academia Rule-based IE PROs •Declarative Heuristic •Easy to comprehend •Easy to maintain •Easy to incorporate domain knowledge •Easy to debug ML-based IE PROs •Trainable •Adaptable •Reduces manual effort CONs CONs • Heuristic •Requires tedious manual labor Industry •Requires labeled data •Requires retraining for domain adaptation •Requires ML expertise to use or maintain • Opaque Evaluating Benefits Evaluating IE on its own of IE Precision and Recall Evaluating Costs of IE Labor cost of writing rules Evaluating IE as part of a larger process Using ill-defined metrics that are subject to change Labor cost Hardware cost Business risk Others What’s the research in Rule-based IE? BRIDGING THE GAP Where is the research in rule-based IE? Making it more principled, effective, and efficient Define standard IE rule language and data model. • What is the right data model to capture text, annotations over text, and their properties? • Can we establish a standard declarative extensible rule language to solve most IE tasks encountered so far? Systems research based on standard IE rule language. • Data representation • Automatic performance optimization • Exploring modern hardware … ML research based on standard IE rule language • How to learn basic primitives such as regular expressions and dictionaries? • How to automatically generate rules that are understandable and maintainable?
Advertisement