SlideShare a Scribd company logo
1 of 1
Download to read offline
Context-Free LanguagesRegular Languages
Information Extraction Grammars
ECIR 2015 Vienna, March 30th
Mónica Marrero
National Supercomputing Center, Spain
Julián Urbano
Universitat Pompeu Fabra, Spain
Problem: Grammar-based Named Entity (NE) Recognition Patterns
Features
Part of speech
Case
Gazetteers
Stem
[etc.]
(Semi-)automatic Learning Method
More than
one feature?
Regular Cascade Context-free
Natural/Markup
Lang. expressiveness?
Regular Cascade Context-free
Avoid extra
ambiguity?
Regular Cascade Context-free
Regular
Expressions
Cascade
Grammars
Context-Free
Grammars
Human-readable and based on standards
NE: Person NE: Time NE: Location
Information Extraction systems should be capable of adapting to different entities and domains.
How can we decide what is the best model for a Named Entity Recognition system?
Proposal: Information Extraction Grammars for Named Entity Recognition
Formally, 𝐼𝐸𝐺 = (𝒱, 𝑆, Σ, 𝒫, 𝒞)
𝒱: set of non-terminals
𝑆 ∈ 𝒱: initial symbol
Σ: input alphabet
𝒫: set of production rules
𝒞: set of condition sets assigned to non-terminals,
expressed as function-value pairs 𝑓, 𝑦
All derivations must meet:
𝐴
∗ 𝐼𝐸𝐺
𝜔 ≔ 𝐴
∗ 𝐺
𝜔 and ∀ 𝑓, 𝑦 ∈ 𝒞 𝐴 ∶ 𝑓 𝜔 = 𝑦
Context-Free
Grammar 𝐺
IEG for the recognition of full person names
using First/Last name gazetteers
𝑆 → 𝐹𝐿𝐿 𝑆 → 𝐹𝐿 𝑆 → 𝐹
𝐹 → 𝑇 𝐿 → 𝑇 𝑇 → [a-zA-Z0-9]+
𝒞 𝐹 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃
𝒞 𝐿 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃
Lisa Brown Smith will present at 4 pm in Foyer room
Similar to synthesized attributes in S-attributed grammars, but in this case
the values of the attributes are given upfront and they are used to constrain the parsing
Computational Complexity
Regular Expression
O(ns2)
Cascade Grammar
O(mns2)
IEG
O(n(tm+s2))
Context-Free Grammar
O(n3)
IEG
O(n3)
Sizes of n: input, m: features, s: states in the automata, t: non-terminals with conditions associated
Summary and Future Work
• Information Extraction Grammars
- Based on standards
- Expressiveness of context-free grammars
- Support for custom features
- Competitive complexity using standard
recognition methods
• Contributes to the flexibility of Information
Extraction tools that can work independently of
the kind of features and the expressiveness of the
language to recognize
• Future work: optimization of the recognition
methods and use of probabilities in the conditions

More Related Content

Viewers also liked

SPC - Statistical process control
SPC  - Statistical process controlSPC  - Statistical process control
SPC - Statistical process control
Senthil kumar
 
Management information system
Management  information systemManagement  information system
Management information system
Ramya Sree
 
Management information system
Management information systemManagement information system
Management information system
Anamika Sonawane
 

Viewers also liked (11)

Data and Information Visualization: the Principles of Infographics - English ...
Data and Information Visualization: the Principles of Infographics - English ...Data and Information Visualization: the Principles of Infographics - English ...
Data and Information Visualization: the Principles of Infographics - English ...
 
Mark Harrison SPC Implementation
Mark Harrison   SPC ImplementationMark Harrison   SPC Implementation
Mark Harrison SPC Implementation
 
SPC - Statistical process control
SPC  - Statistical process controlSPC  - Statistical process control
SPC - Statistical process control
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...
 
Management Information System (Full Notes)
Management Information System (Full Notes)Management Information System (Full Notes)
Management Information System (Full Notes)
 
Management Information System (MIS)
Management Information System (MIS)Management Information System (MIS)
Management Information System (MIS)
 
Management information system
Management information systemManagement information system
Management information system
 
Management information system
Management  information systemManagement  information system
Management information system
 
Management information system
Management information systemManagement information system
Management information system
 
Types Of Information Systems
Types Of Information SystemsTypes Of Information Systems
Types Of Information Systems
 

More from Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Julián Urbano
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Julián Urbano
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
Julián Urbano
 

More from Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
 

Recently uploaded

Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
jyothisaisri
 

Recently uploaded (20)

Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
VILLAGE ATTACHMENT For rural agriculture PPT.pptx
VILLAGE ATTACHMENT For rural agriculture  PPT.pptxVILLAGE ATTACHMENT For rural agriculture  PPT.pptx
VILLAGE ATTACHMENT For rural agriculture PPT.pptx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
short range interaction for protein and factors influencing or affecting the ...
short range interaction for protein and factors influencing or affecting the ...short range interaction for protein and factors influencing or affecting the ...
short range interaction for protein and factors influencing or affecting the ...
 
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 

Information Extraction Grammars

  • 1. Context-Free LanguagesRegular Languages Information Extraction Grammars ECIR 2015 Vienna, March 30th Mónica Marrero National Supercomputing Center, Spain Julián Urbano Universitat Pompeu Fabra, Spain Problem: Grammar-based Named Entity (NE) Recognition Patterns Features Part of speech Case Gazetteers Stem [etc.] (Semi-)automatic Learning Method More than one feature? Regular Cascade Context-free Natural/Markup Lang. expressiveness? Regular Cascade Context-free Avoid extra ambiguity? Regular Cascade Context-free Regular Expressions Cascade Grammars Context-Free Grammars Human-readable and based on standards NE: Person NE: Time NE: Location Information Extraction systems should be capable of adapting to different entities and domains. How can we decide what is the best model for a Named Entity Recognition system? Proposal: Information Extraction Grammars for Named Entity Recognition Formally, 𝐼𝐸𝐺 = (𝒱, 𝑆, Σ, 𝒫, 𝒞) 𝒱: set of non-terminals 𝑆 ∈ 𝒱: initial symbol Σ: input alphabet 𝒫: set of production rules 𝒞: set of condition sets assigned to non-terminals, expressed as function-value pairs 𝑓, 𝑦 All derivations must meet: 𝐴 ∗ 𝐼𝐸𝐺 𝜔 ≔ 𝐴 ∗ 𝐺 𝜔 and ∀ 𝑓, 𝑦 ∈ 𝒞 𝐴 ∶ 𝑓 𝜔 = 𝑦 Context-Free Grammar 𝐺 IEG for the recognition of full person names using First/Last name gazetteers 𝑆 → 𝐹𝐿𝐿 𝑆 → 𝐹𝐿 𝑆 → 𝐹 𝐹 → 𝑇 𝐿 → 𝑇 𝑇 → [a-zA-Z0-9]+ 𝒞 𝐹 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃 𝒞 𝐿 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃 Lisa Brown Smith will present at 4 pm in Foyer room Similar to synthesized attributes in S-attributed grammars, but in this case the values of the attributes are given upfront and they are used to constrain the parsing Computational Complexity Regular Expression O(ns2) Cascade Grammar O(mns2) IEG O(n(tm+s2)) Context-Free Grammar O(n3) IEG O(n3) Sizes of n: input, m: features, s: states in the automata, t: non-terminals with conditions associated Summary and Future Work • Information Extraction Grammars - Based on standards - Expressiveness of context-free grammars - Support for custom features - Competitive complexity using standard recognition methods • Contributes to the flexibility of Information Extraction tools that can work independently of the kind of features and the expressiveness of the language to recognize • Future work: optimization of the recognition methods and use of probabilities in the conditions