The document discusses cross-lingual ontology translation and lexicalization. It presents the lemon model for connecting ontology concepts to lexical information to facilitate tasks like machine translation. The lemon model represents lexical entries, forms, linguistic structure, meanings, and syntactic frames. It separates ontological semantics from lexical features to enable linking terminology to external resources for translation. The model supports representing multilingual labels and relating terms through concepts like narrower/broader. This enables cross-lingual information extraction and search over linked data.
Author Credits - Maaz Nomani
A Proposition Bank is a collection of sentences which are hand-annotated with the information of semantic labels in the respective sentences. Currently, around 10,000 sentences containing 0.2 million words have been hand-annotated with the semantic labels information.
This is a natural language resource of very rich linguistic information which can be used in a variety of NLP applications such as semantic parsing, syntactic parsing, sentiment analysis, dialogue systems etc.
In this paper, we present one such resource for a resource-poor Indian language Urdu. The Proposition Bank of Urdu is built on already built Treebank of Urdu (A Treebank is a corpus of sentences annotated with their POS, morphological, head, TAM and dependency labels information). A Propbank adds a layer of semantic information over this Treebank and hence can facilitate semantic parsing and other semantic level operations in a natural language sentence.
In this talk I intend to review some basic and high-level concepts like formal languages, grammars and ontologies. Languages to transmit knowledge from a sender to a receiver; grammars to formally specify languages; ontologies as formals specifications of specific knowledge domains. After this introductory revision, enhancing the role of each of those elements in the context of computer-based problem solving (programming), I will talk about a project aimed at automatically infer and generate a Grammar for a Domain Specific Language (DSL) from a given ontology that describes this specific domain. The transformation rules will be presented and the system, Onto2Gra, that fully implements that "Ontological approach for DSL development" will be introduced.
Semantic Rules Representation in Controlled Natural Language in FluentEditorCognitum
Abstract. The purpose of this paper is to present a way of representation of semantic rules (SWRL) in controlled natural language (English) in order to facilitate understanding the rules by humans interacting with a machine. The rule representation is implemented in FluentEditor – ontology editor with controlled natural language (CNL). The representation can be used in a lot of domains where people interact with machines and use specialized interfaces to define knowledge in a system (semantic knowledge base), e.g. representing medical knowledge and guidelines, procedures in crisis management or in management of any coordination processes. Such knowledge bases are able to support decision making in any discipline provided there is a knowledge stored in a proper semantic way.
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively
Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010).
Bangkok, Thaıland.
Author Credits - Maaz Nomani
A Proposition Bank is a collection of sentences which are hand-annotated with the information of semantic labels in the respective sentences. Currently, around 10,000 sentences containing 0.2 million words have been hand-annotated with the semantic labels information.
This is a natural language resource of very rich linguistic information which can be used in a variety of NLP applications such as semantic parsing, syntactic parsing, sentiment analysis, dialogue systems etc.
In this paper, we present one such resource for a resource-poor Indian language Urdu. The Proposition Bank of Urdu is built on already built Treebank of Urdu (A Treebank is a corpus of sentences annotated with their POS, morphological, head, TAM and dependency labels information). A Propbank adds a layer of semantic information over this Treebank and hence can facilitate semantic parsing and other semantic level operations in a natural language sentence.
In this talk I intend to review some basic and high-level concepts like formal languages, grammars and ontologies. Languages to transmit knowledge from a sender to a receiver; grammars to formally specify languages; ontologies as formals specifications of specific knowledge domains. After this introductory revision, enhancing the role of each of those elements in the context of computer-based problem solving (programming), I will talk about a project aimed at automatically infer and generate a Grammar for a Domain Specific Language (DSL) from a given ontology that describes this specific domain. The transformation rules will be presented and the system, Onto2Gra, that fully implements that "Ontological approach for DSL development" will be introduced.
Semantic Rules Representation in Controlled Natural Language in FluentEditorCognitum
Abstract. The purpose of this paper is to present a way of representation of semantic rules (SWRL) in controlled natural language (English) in order to facilitate understanding the rules by humans interacting with a machine. The rule representation is implemented in FluentEditor – ontology editor with controlled natural language (CNL). The representation can be used in a lot of domains where people interact with machines and use specialized interfaces to define knowledge in a system (semantic knowledge base), e.g. representing medical knowledge and guidelines, procedures in crisis management or in management of any coordination processes. Such knowledge bases are able to support decision making in any discipline provided there is a knowledge stored in a proper semantic way.
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively
Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010).
Bangkok, Thaıland.
"Bilingual Terminology Extraction from TMX. A state-of-the-art overview." Presentation at Translating Europe Forum 2016. Focus on translation technology.
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
Material presented at the Tenth Biennial Conference of the
Association for Machine Translation in the Americas (AMTA 2012), San Diego, CA.
Download paper at http://hal.archives-ouvertes.fr/hal-00730325.
Instiutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
Material presented at the TKE (Terminology and Knowledge Engineering) Conference 2010, Dublin, Ireland.
Download paper at http://hal.archives-ouvertes.fr/hal-00544403
Insitutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina.
Applicative evaluation of bilingual terminologiesEstelle Delpech
Material presented at the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia.
Download paper: http://hal.archives-ouvertes.fr/hal-00585187
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
Material presented at the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India.
Paper download at http://hal.archives-ouvertes.fr/hal-00743807.
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts.
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
Talk at RESAW conference 2015 on web crawling and web corpus construction. Challenges to tackle: metadata extraction, quality assessment, and licensing.
Much work flows into ensuring the “scientificity” of web texts and making the texts not only available but also citable in a scholarly sense.
"Bilingual Terminology Extraction from TMX. A state-of-the-art overview." Presentation at Translating Europe Forum 2016. Focus on translation technology.
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
Material presented at the Tenth Biennial Conference of the
Association for Machine Translation in the Americas (AMTA 2012), San Diego, CA.
Download paper at http://hal.archives-ouvertes.fr/hal-00730325.
Instiutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
Material presented at the TKE (Terminology and Knowledge Engineering) Conference 2010, Dublin, Ireland.
Download paper at http://hal.archives-ouvertes.fr/hal-00544403
Insitutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina.
Applicative evaluation of bilingual terminologiesEstelle Delpech
Material presented at the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia.
Download paper: http://hal.archives-ouvertes.fr/hal-00585187
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
Material presented at the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India.
Paper download at http://hal.archives-ouvertes.fr/hal-00743807.
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts.
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
Talk at RESAW conference 2015 on web crawling and web corpus construction. Challenges to tackle: metadata extraction, quality assessment, and licensing.
Much work flows into ensuring the “scientificity” of web texts and making the texts not only available but also citable in a scholarly sense.
Presentation made in the context of the FAO AIMS Webinar titled “Knowledge Organization Systems (KOS): Management of Classification Systems in the case of Organic.Edunet” (http://aims.fao.org/community/blogs/new-webinaraims-knowledge-organization-systems-kos-management-classification-systems)
21/2/2014
Directions This assignment is for a Reading Course. The cross-disAlyciaGold776
Directions: This assignment is for a Reading Course. The cross-disciplinary unit that I will be implementing in my classroom is Social Studies (Grade 11 US History). Attached you will find a copy of the lesson plan and an attachment of Reading Standards. Current resources and tools that would enhance the learning experience for all students is Kahoot, Quizlet or Nearpod. Must use original work and must be APA formatted.
Please review the Special Accommodations and ELL section on the last page of the lesson plan, all bench marks and state standards for the lesson is within the lesson plan.
Benchmark - Cross-Disciplinary Unit Narrative
For this benchmark, write a 750-1,000 word narrative about a cross-disciplinary unit you would implement in your classroom. Choose a minimum of two standards, at least one for the content area of your field experience classroom and at least one supportive literacy standard to focus on for the unit narrative. You may use your Topic 3 "Instructional Strategies for Literacy Integration Matrix as a guide to inform this assignment."
Your narrative must include:
· Unit Description and Rationale: Complete description of unit theme and purpose, including learning objectives, based on the content area standards and literacy standards.
· Learning Opportunities: Description of two learning opportunities that create ways for students to learn, practice, and master academic language in content areas
· Collaboration: Description of how you would facilitate students’ collaborative use of current tools and resources to maximize content learning in varied contexts
· Support: Description of support that would be implemented for student literacy development across content areas
· Differentiation: Description of how the lessons within the unit would provide differentiated instruction
· Strategies: Description of strategies that you would use within your unit to advocate for equity in your classroom
· Cultural Diversity: Description of the effect of cultural diversity in the classroom on reading and writing development. Describe how the unit capitalizes on cultural diversity.
· Resources: Description of current resources and tools that would enhance the learning experience for all students.
Support your findings with 3-5 scholarly resources.
ELA Standards and Technology Matrix (Grades 11-12)
Click on the standard to view more information in CPALMS. Click on the links to visit the websites for the featured technology tools.
Grade Standards Technology
11-12 LAFS.1112.L.3.4
Determine or clarify the meaning of unknown and multiple-
meaning words and phrases based on grades 11–12 reading
and content, choosing flexibly from a range of strategies.
a. Use context (e.g., the overall meaning of a sentence,
paragraph, or text; a word’s position or function in a
sentence) as a clue to the meaning of a word or phrase.
b. Identify and correctly use patterns of word changes that
indic ...
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonPretaLLOD
K Dictionaries and Lexicala Data Workshop, October 3, 2019. Sintra, Portugal
By JULIA BOSQUE-GIL
Distributed Information Systems Group
University of Zaragoza, Spain
jbosque@unizar.es
The objective of this webinar is to provide a brief overview of the Knowledge Organization Systems (KOS) and the tools used for managing them. The presentation will focus on the management of the multilingual Organic.Edunet ontology as a case study. In this context it will present aspects such as the collaborative work, multilinguality needs and update of the concepts using an online KOS management tool (MoKi).
Workshop on Learning Technology Standards for Agriculture and Rural Development (AgroLT 2008)
September 19, 2008, Athens, Greece
In conjunction with
4th International Conference on Information and Communication Technologies in Bio and Earth Sciences (HAICTA 2008)
Currently Experience API (xAPI) mostly focuses on providing “structural” interoperability of xAPI statements via JavaScript Object Notation Language (JSON). Structural interoperability defines the syntax of the data exchange and ensures the data exchanged between systems can be interpreted at the data field level. In comparison, semantic interoperability leverages the structural interoperability of the data exchange, but provides a vocabulary so other systems and consumers can also interpret the data. Analytics produced by xAPI statements would benefit from more consistent and semantic approaches to describing domain-specific verbs, activityTypes, attachments, and extensions. The xAPI specification recommends implementers to adopt community-defined vocabularies, but the only current guidance is to provide very basic, human-readable identifier metadata (e.g., literal string name(display), description). The main objective of the Vocabulary and Semantic Interoperability Working Group (WG) is to research machine-readable, semantic technologies (e.g., RDF, JSON-LD) in order to produce guidance for Communities of Practice (CoPs) on creating, publishing, or managing controlled vocabulary datasets (e.g., verbs). In this session, you will see a brief introduction to modern controlled vocabulary practices and how they can be applied to xAPI to add semantic expressiveness of controlled vocabularies. The progress and resources from the Vocabulary WG (started in April 2015) will also be shared.
A special session about using DC metadata to describe scholarly research papers held during the DC-2006 conference in Manzanillo, Mexico in October 2006.
Similar to Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa (UNISA) (20)
Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa (UNISA)
1.
2. Context and Motivation Monnet use case in financial domain query financial information Cross-vocabulary Cross-lingual Get result in your own language Research challenges localization & translation of vocabularies cross-lingual ontology-based information extraction
8. XBRL – Semantic Analysis “Enhance semantics to facilitate translation and information extraction.”
9. XBRL – Terminological Analysis ifrs:MinimumFinanceLeasePaymentsReceivableAtPresentValue ifrs:MinimumFinanceLeasePaymentsReceivable Minimum finance lease payments receivable, at present value sapTerm:payments googleDefine:leasePayments sapTerm:financeLease googleDefine:Finance_lease Domain Independent Domain Related Domain Specific Domain Related Domain Independent Domain Independent Domain Specific
10. XBRL – Linguistic Analysis Financial text “… received minimum finance lease payments …” verb “… lease payment …” complex singular simple minimum finance lease payments receivable XBRL term adverb … lease payments … plural
11. Outline 1. Research challenge and motivation 2. Ontology Translation 3. Lexicalization (lemon) 4. CLOBIE (CL Ontology-based Inf. Extraction)
12. Translation using STL Models developed in Monnet English / German / Spanish / Dutch …Net2 Afrikaans? Zulu? Xhosa? … ifrs:MinimumFinanceLeasePaymentsPayable ifrs:ProfitLossBeforeTax ifrs:Revenue
13. Application in Machine Translation in Dutch available-for-sale financial assets IFRS, SAPTerm, GoogleDefine 1. term analysis using: domain TM (IFRS), Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale] [financial] [assets] 2. translate subterms using: [voorverkoopbeschikbare] [financiële] [activa] 3. term synthesis using: grammars (rules, statistical models) voor verkoop beschikbare financiële activa
14. Application in Machine Translation in Afrikaans available-for-sale financial assets IFRS, SAPTerm, GoogleDefine 1. term analysis using: domain TM (IFRS), Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale] [financial] [assets] 2. translate subterms using: [beskikbaarvirverkoop] [finansiële] [bates] 3. term synthesis using: grammars (rules, statistical models) finansiële bates beskikbaar vir verkoop
15. Application in Machine Translation in Spanish available-for-sale financial assets IFRS, SAPTerm, GoogleDefine 1. term analysis using: domain TM (IFRS), Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale] [financial] [assets] 2. translate subterms using: [disponiblespara la venta] [financia] [activos] 3. term synthesis using: grammars (rules, statistical models) activos financieros disponibles para la venta
16. Outline 1. Research challenge and motivation 2. Ontology Translation 3. Lexicalization (lemon) 4. CLOBIE (CL Ontology-based Inf. Extraction)
17. Why do we need a lexicon? http://en.wikipedia.org/wiki/Finance_lease “loads of unlinked domain-specific terminology on the web !” An interoperable web for … ? re-use enable multilinguality cross-lingual search cross-lingual fact extraction http://www.investopedia.com/terms/l/lease-payments.asp
18. Lexicon standards overview ISO (XML) TEI (Text Encoding Initiative) LMF (Lexical Markup Framework) W3C & Semantic Web (RDF / OWL) build-in rdfs:label lightweight linguistic representations (SKOS, SKOS-XL) rich linguistic representations (GOLD, LexInfo)
20. SKOS – Multilingual Information Not much uptake yet? from http://data.nytimes.com/
21. Ontology-Text Mismatch ‘Edificio-historico’ vs. ‘…edificio, declarado Monumento Histórico…’ >> goes beyond SKOS (monolingual & multilingual term variants) >> requires representation of lexical information to compute linguistic variants, e.g. ‘edificio historico[apposVP[NP[Adj]]]’
22. A Lexicon Model for Ontologies Requirements for ‘ontology-lexicon’ model Represent linguistic information relative to ontology Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application Keep semantics separate from linguistic info Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge Modular, minimal design Provide simple core model that can be easily extended upon need
23. Was there a solution already? - SKOS Simple Knowledge Organization System – SKOS General model for formalizing thesauri, terminologies and related semantic and knowledge resources Formalization of terminology in focus - terminology, classification, Semantic Web communities Does not address linguistic aspects of terminology, or therefore, the lexicon-ontology interface http://www.w3.org/2004/02/skos/
24. Was there a solution already? - GOLD General Ontology for Linguistic Description – GOLD Community-based ontology of linguistics Linguistic study in focus - linguistics community Formal model of linguistics as an ontology, but not about connecting lexical features to ontological semantics Other issues: very big, modularity? http://linguistics-ontology.org/gold/2010
25. Was there a solution already? - OWN OntoWordNet – OWN Formal specification of WordNet through extension and axiomatization of its conceptual relations Formal knowledge representation in focus - logic, knowledge representation, Semantic Web communities Turns WordNet into an ontology but not about connecting lexical features to ontological semantics http://wiki.loa-cnr.it/index.php/LoaWiki:OWN
26. Was there a solution already? - LMF Lexical Markup Framework – LMF General model for formalizing and sharing of machine-readable dictionaries Lexical knowledge representation in focus - lexicography, NLP communities Very close to ontology-lexicon requirements, but no view on how lexical features link to ontological semantics – semantics is limited to a notion of sense based on synsets Other issues: incomplete formal model, focus on classes, less on properties/relations http://www.lexicalmarkupframework.org/
27. lemon lexicon model for ontologies: ‘lemon’ General model for formalizing lexical features relative to independently defined ontological semantics http://www.monnet-project.eu/lemon Two-level modelling Abstract level (meta-model): lemon Instantiation level (lexicon model): e.g. ‘LexInfo2’ http://lexinfo.net/
30. lemon: Lexicon Lexicon: wild animals entry entry entry LE: Kudu LE: shaped like a Kudu LexicalEntry can be a Word, Phrase, or Part - such as an Affix
31. lemon: Form wild animals otherForm abstractForm canonicalForm LE F LE F LE F “kudu” “greater” “great”
32. lemon: Structure ? LE: shaped like a Kudu LE: shaped LE: like LE: a LE: Kudu LexicalEntry can be decomposed into one or more Components and compositional structure can be represented
33. lemon: Structure - Example :Component :Component :Component :Component lexeme edge edge decomposition :LexicalEntry :node :LexicalEntry :node :node :LexicalEntry :node :node :LexicalEntry :node :LexicalEntry :node shaped like a kudu constituent:PP shaped, lemma=“shape” constituent:VP constituent:VBN like, lemma=“like” constituent:NP constituent:IN a constituent:DT Kudu constituent:NNP element leaf edge edge element leaf edge element leaf edge element leaf
34. lemon: Meaning & Reference LE: kudu lexeme sense LS sememe reference
35. lemon: Meaning & Reference LE: kudu sense sense LE: greater kudu narrower LS LS reference reference preSem
36. lemon: Meaning & Reference LE:greater kudu LE:lesser kudu sense sense lexical incompatibility LS LS incompatible reference reference dbpedia:Kudu
37. lemon: Meaning & Reference LE: kudu LE: goat sense sense ontological incompatibility LS LS reference reference owl:disjointWith
38. lemon: Lexical Projection LexicalEntry can introduce a syntactic frame with arguments that are mapped to LexicalSense and indirectly to ontological semantic objects/properties
44. Outline 1. Research challenge and motivation 2. Ontology Translation & Inform. Extraction 3. Lexicalization (lemon) 4. CLOBIE (Cross-lingual Ontology-based Information Extraction)
45. What is CLOBIE Information Extraction Monolingual No semantics Cross-lingual Information Extraction Multilingual Ontology-based Information Extraction Semantics in the background
46. What is CLOBIE Information extraction(monolingual) Information extraction (multilingual) Information extraction with semantics “SAP sold risk securities at a value of 12b EUR.” PATTERN: .*SAP.*[sells|sold|issues].*[risk securities].*[0-9]+b [EUR|USD].* PATTERN_DE: .*SAP.*verkaufte*.*[RisikoWertpapiere].*[0-9]+b [EUR|USD].* .*[COMPANY] sell [ASSETS] .* PATTERN: .*$COMPANY .*[sells|sold|issues].*$ASSETS.*$MONETARY_VALUE.} financial assets non-financial assets risk securities Property, Plant & Equipment
47. Application in Information Extraction (IE) :MinimumFinanceLeasePaymentsReceivable rdfs:subClassOf xbrli:monetaryItemType ; rdfs:label “Minimum finance lease payments receivable”@en . semantically lifted Minimum finance lease payments receivable term analysis receivables payments received linguistic analysis Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 SAP Annual Report 2008 SAP Annual Report 2008 SAP Annual Report 2008 SAP Annual Report 2008 …The fair value of the Group’s finance leasereceivablesat 23 February 2008 was £5m… ..As at December 31, 2008, the future minimumlease payments expected to be received was €16million… …The fair value of the Group’s finance leasereceivablesat 23 February 2008 was £5m… ..As at December 31, 2008, the future minimumlease paymentsexpected to be received was €16million… …The fair value of the Group’s finance lease receivables at 23 February 2008 was £5m… ..As at December 31, 2008, the future minimum lease payments expected to be received was €16million… …The fair value of the Group’s finance lease receivables at 23 February 2008 was £5m… ..As at December 31, 2008, the future minimum lease payments expected to be received was €16million…
48. CLOBIE Interdisciplinary Statistical MT Rule-based MT Localization Term extraction Relation extraction Extract. grammars Machine Translation Information Extraction NLP Corpus query Term analysis POS tagging Morph analysis Information Retrieval CLOBIE Semantic Web TF-IDF Web query ranking algorithms CLIR (ESA, MT-based) Ontologies SKOS, lemon SPARQL queries
49. Why CLOBIE? Many unstructured resources (News, FinReps) Knowledge in SW is often: Not dynamic (no regular, only manual updates) Knowledge across languages/countries not integrated
50.
51.
52.
53. CLOBIE Data set (Wind Energy) 10 companies in Wind Energy domain Financial reports in German / Spanish / English / Dutch IFRS / DE-GAAP Semantics defined by IFRS vocabulary xEBR vocabulary
54. Next steps… Benchmark development and evaluation on the basis of a data set in finance domain financial reports and news from different companies in wind energy domain multilingual (German, Dutch, Spanish, English) multi-vocabulary (IFRS, European local GAAPs, DBPedia) Cross-lingual ontology-based information retrieval system Generate ontology-based information extraction grammars from lemon ontology-lexicons
Editor's Notes
Frame: VerbNet, …LinguisticOntology: GOLD, LexInfo2Form: SKOSLexicalSense-Ontology: SKOS-XLNode/Edge: ParseStructures rare formats such as NEGRA Corpus / TIGER TAG SET by IMS Stuttgart or StanfordParser proprietary
Also phrasal lexicon
Lemon distinguishes among different types of lexical forms
LexicalSenseunderspecified sense THAT points to a language-external referenceunique ontological semantic object (depending on conditions and context) can have subsense andsenseRelation with other lexicalSensesemene relation between lexicalSense and ontologicalSemantic Object can be either: pref / alt / hiddenSem