SlideShare a Scribd company logo
1 of 1
Download to read offline
On the Definition of 
Patterns for Semantic Annotation 
Mónica Marrero, Julián Urbano, Jorge Morato and Sonia Sánchez-Cuadrado 
University Carlos III of Madrid, Computer Science Department 
mmarrero@inf.uc3m.es, jurbano@inf.uc3m.es, jmorato@inf.uc3m.es, ssanchec@ie.inf.uc3m.es, 
Semantic Web Semantic Annotation Today of Web Resources 
Automatic annotation tools and pattern models 
Automatic or semi-automatic annotation tools help making the process scalable using patterns. 
As the patterns appear in a level previous to the annotation itself, extraction patterns are 
more flexible and effective regarding changes in the documents because only the patterns, 
rather than all annotations, need to be modified. But some issues arise: 
The Web is very dynamic… 
Can we modify and reuse 
these patterns? 
The Web has very diverse 
contents… What elements should 
these patterns recognize? 
Based on what features? 
The Web is huge… 
How can we reduce 
the cost of annotating? 
Non-human-readable or 
complex patterns are 
harder to modify and hence 
harder to reuse 
To be reused, patterns 
are recommended to be 
modifiable and Modular 
Context free grammars are capable of 
recognizing virtually every natural 
language construction, but bag of 
words techniques, wrappers and 
regular expressions are not 
The features most frequently modeled 
are those referred to the syntax, 
semantics and format of the text. 
New types of features usually imply the 
modification of the schema 
The creation of patterns should not be more 
expensive than manual annotation. The 
collaborative creation of patterns and their 
reuse could reduce costs. But the patterns 
have to be easily accessible first 
Standard web languages like OWL or XML 
would make the patterns easier to access, 
understand, manage (thanks to appropriate 
tools) and distribute, promoting their adoption 
Powerful, flexible, reusable, modifiable, 
modular, distributable and accessible 
pattern models 
More complexity in the definition of the pattern model 
The more complex the pattern model, the lesser their adoption 
Standardization reduces the problem, but how can we “create” one? 
Proposal 
Adaptation of SRGS 
for Information Extraction 
Semantic attribute 
added to rule element 
 Identifies the text semantics, typically a 
concept of an ontology, with its URI 
 The semantics associated to non-terminals 
allow to specify complex scenarios from 
simple semantics (e.g. speaker, place and 
time of a talk). 
Powerful to recognize 
context-free languages 
Existence of 
Formalizations 
and tools for 
management ABNF 
Standard language 
• Semantic attribute of the rules 
• Additional operations to the 
alternatives: AND and NOT 
• Restriction functions in the rules 
IE-SRGS 
Adopt the Speech Recognition Grammar 
Specification (SRGS), which has the purpose 
of guiding speech recognizers on the web by 
modeling the expected voice commands. 
SRGS 
• XML language 
• Alternative weights 
• Repetition probabilities 
• Use of rules from other grammars 
• Grammar attributes 
• Strings as values 
• Repetition characters 
• Incremental alternatives 
• Grouping 
Bag of words 
Conclusions and Future Work 
The adaptation of the SRGS standard offers 
powerful and flexible patterns, and eases the 
development of new patterns because of the 
application of standards offering formalisms and 
tools, and the easy distribution, reuse and access 
of the existing patterns. 
Research in the adaptation of the SRGS standard 
to Information Extraction is an ongoing work, 
focused on the automatic generation 
of such patterns from examples, 
which would eventually lead to 
fully automated semantic annotation. 
We acknowledge the National Plan of Scientific Research, Development and 
Technological Innovation, which has funded this work through the research 
project TIN2007-67153. Pictures by 
Human-readable 
Web Standard 
expressed with XML 
ABNF XML (SRGS) 
Rule 
definition 
A = … 
grammarrule id=”A” 
…/rule/grammar 
Alternative 
A = a / b 
A =/ c 
rule id=”A”one-of 
itema/item… 
/one-of/rule 
Alt. weight - item weight=”n”a/item 
Repetition 
min*maxa 
na 
item repeat=min-maxa 
/item 
Repetition 
probability 
- 
item repeat=min-max 
repeat-prob=”p”a/item 
Non-terminal 
reference 
A = B C 
rule id=”A” 
ruleref uri=”gram#B”/… 
/rule 
AND and NOT elements 
added as children of rule 
 Boolean combination of non-terminals 
 The AND operator allows to specify diverse 
restrictions (e.g. format, semantics, syntax, 
etc.) expressed syntactically by means of 
vocabularies (e.g. named entity tags, syntax 
tags, lemmas, HTML tags, characters, etc.) 
 These operators can be specially useful for 
techniques performing some kind of learning 
based on positive and negative examples 
Restriction element 
added as child of rule 
 Identifies functions by their URI 
 They can be web services or local functions 
 The non-terminal accepts the text only if all 
functions evaluate to true 
 Not all restrictions can be expressed 
syntactically (e.g. words in a gazetteer), or 
they are more complex and inefficient (e.g. 
strong tags in HTML could imply processing 
very large texts) 
 They are variable, depending on the type of 
document (e.g. strong in HTML or PDF) 
 It is possible to create distributed 
repositories of frequently used functions 
for certain types of document 
based on ABNF 
(Augmented Backus-Naur 
Form) but more powerful 
Well defined and 
accepted DTD to map 
ABNF constructions to 
XML (see table with 
ABNF-SRGS mappings) 
Can combine rules 
with references to rules 
from other grammars 
BNF 
Wrapper 
Regular expression

More Related Content

More from Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Julián Urbano
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowJulián Urbano
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationJulián Urbano
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationJulián Urbano
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured DocumentsJulián Urbano
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano
 
A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...Julián Urbano
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...Julián Urbano
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackJulián Urbano
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...Julián Urbano
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Julián Urbano
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Julián Urbano
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityJulián Urbano
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...Julián Urbano
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Julián Urbano
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...Julián Urbano
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Julián Urbano
 

More from Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 

On the Definition of Patterns for Semantic Annotation

  • 1. On the Definition of Patterns for Semantic Annotation Mónica Marrero, Julián Urbano, Jorge Morato and Sonia Sánchez-Cuadrado University Carlos III of Madrid, Computer Science Department mmarrero@inf.uc3m.es, jurbano@inf.uc3m.es, jmorato@inf.uc3m.es, ssanchec@ie.inf.uc3m.es, Semantic Web Semantic Annotation Today of Web Resources Automatic annotation tools and pattern models Automatic or semi-automatic annotation tools help making the process scalable using patterns. As the patterns appear in a level previous to the annotation itself, extraction patterns are more flexible and effective regarding changes in the documents because only the patterns, rather than all annotations, need to be modified. But some issues arise: The Web is very dynamic… Can we modify and reuse these patterns? The Web has very diverse contents… What elements should these patterns recognize? Based on what features? The Web is huge… How can we reduce the cost of annotating? Non-human-readable or complex patterns are harder to modify and hence harder to reuse To be reused, patterns are recommended to be modifiable and Modular Context free grammars are capable of recognizing virtually every natural language construction, but bag of words techniques, wrappers and regular expressions are not The features most frequently modeled are those referred to the syntax, semantics and format of the text. New types of features usually imply the modification of the schema The creation of patterns should not be more expensive than manual annotation. The collaborative creation of patterns and their reuse could reduce costs. But the patterns have to be easily accessible first Standard web languages like OWL or XML would make the patterns easier to access, understand, manage (thanks to appropriate tools) and distribute, promoting their adoption Powerful, flexible, reusable, modifiable, modular, distributable and accessible pattern models More complexity in the definition of the pattern model The more complex the pattern model, the lesser their adoption Standardization reduces the problem, but how can we “create” one? Proposal Adaptation of SRGS for Information Extraction Semantic attribute added to rule element Identifies the text semantics, typically a concept of an ontology, with its URI The semantics associated to non-terminals allow to specify complex scenarios from simple semantics (e.g. speaker, place and time of a talk). Powerful to recognize context-free languages Existence of Formalizations and tools for management ABNF Standard language • Semantic attribute of the rules • Additional operations to the alternatives: AND and NOT • Restriction functions in the rules IE-SRGS Adopt the Speech Recognition Grammar Specification (SRGS), which has the purpose of guiding speech recognizers on the web by modeling the expected voice commands. SRGS • XML language • Alternative weights • Repetition probabilities • Use of rules from other grammars • Grammar attributes • Strings as values • Repetition characters • Incremental alternatives • Grouping Bag of words Conclusions and Future Work The adaptation of the SRGS standard offers powerful and flexible patterns, and eases the development of new patterns because of the application of standards offering formalisms and tools, and the easy distribution, reuse and access of the existing patterns. Research in the adaptation of the SRGS standard to Information Extraction is an ongoing work, focused on the automatic generation of such patterns from examples, which would eventually lead to fully automated semantic annotation. We acknowledge the National Plan of Scientific Research, Development and Technological Innovation, which has funded this work through the research project TIN2007-67153. Pictures by Human-readable Web Standard expressed with XML ABNF XML (SRGS) Rule definition A = … grammarrule id=”A” …/rule/grammar Alternative A = a / b A =/ c rule id=”A”one-of itema/item… /one-of/rule Alt. weight - item weight=”n”a/item Repetition min*maxa na item repeat=min-maxa /item Repetition probability - item repeat=min-max repeat-prob=”p”a/item Non-terminal reference A = B C rule id=”A” ruleref uri=”gram#B”/… /rule AND and NOT elements added as children of rule Boolean combination of non-terminals The AND operator allows to specify diverse restrictions (e.g. format, semantics, syntax, etc.) expressed syntactically by means of vocabularies (e.g. named entity tags, syntax tags, lemmas, HTML tags, characters, etc.) These operators can be specially useful for techniques performing some kind of learning based on positive and negative examples Restriction element added as child of rule Identifies functions by their URI They can be web services or local functions The non-terminal accepts the text only if all functions evaluate to true Not all restrictions can be expressed syntactically (e.g. words in a gazetteer), or they are more complex and inefficient (e.g. strong tags in HTML could imply processing very large texts) They are variable, depending on the type of document (e.g. strong in HTML or PDF) It is possible to create distributed repositories of frequently used functions for certain types of document based on ABNF (Augmented Backus-Naur Form) but more powerful Well defined and accepted DTD to map ABNF constructions to XML (see table with ABNF-SRGS mappings) Can combine rules with references to rules from other grammars BNF Wrapper Regular expression