Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems!
Rule-based Information Extraction is DEAD
Long Live Rule-based Information Extraction Systems!
Laura Chiticariu, Yunyao Li, Frederick Reiss
IBM Research - Almaden
THE DISCONNECT: ACADEMIC vs. INDUSTRY
Implementations of Entity Extraction
Entity Extraction Papers by Year
Fraction of NLP Papers
Year of Publication
•Easy to comprehend
•Easy to maintain
•Easy to incorporate domain
•Easy to debug
•Reduces manual effort
•Requires tedious manual
•Requires labeled data
•Requires retraining for
•Requires ML expertise to
use or maintain
Evaluating IE on its own of IE
Precision and Recall
Labor cost of writing
Evaluating IE as part of
a larger process
Using ill-defined metrics
that are subject to change
What’s the research in
BRIDGING THE GAP
Where is the research in rule-based IE? Making it more principled, effective, and efficient
Define standard IE rule language and data model.
• What is the right data model to capture text, annotations over text, and their properties?
• Can we establish a standard declarative extensible rule language to solve most IE tasks encountered so far?
Systems research based on standard IE rule language.
• Data representation
• Automatic performance optimization
• Exploring modern hardware …
ML research based on standard IE rule language
• How to learn basic primitives such as regular expressions and dictionaries?
• How to automatically generate rules that are understandable and maintainable?