SlideShare a Scribd company logo
1 of 59
Survey – WG3 ENeL
Automatic Knowledge
Acquisition for Lexicography
Carole Tiberius, Institute for Dutch Lexicology, Leiden, the Netherlands
Kris Heylen, University of Leuven, Belgium
Simon Krek, „Jožef Stefan“ Institute, Ljubljana, Slovenia
Purpose of the survey
Create an inventory of different types of automatic knowledge
acquisition which are currently used within the framework of
lexicographical projects
Automatic Acquisition of Knowledge
Knowledge (data) which
• is automatically obtained from corpora of authentic language
use (both synchronic and diachronic);
• forms either the input for lexicographers (who further
inspect and edit the data) or is included as is in the published
dictionary (possibly marked as being knowledge which has
been automatically derived from corpus data).
Types of Automatically Acquired Knowledge
• (Candidate) Lemma list
• Overall Lemma Frequency information
• Form variation
(e.g. irregular morphology, orthographic variants)
• Example sentences
(cf. Vienna COST workshop)
• Multiword expressions
(i.e. sequences of words with some unpredictable properties such as "to count
somebody in" or "to take a haircut", ranging from collocations and phrasal verbs,
(pragmatic) frozen expressions (e.g. of course, good morning) to traditional idioms,
proverbs etc.)
Types of Automatically Acquired Knowledge …
• Neologisms
• Definitions
• Translation Equivalents
• Knowledge Rich Contexts
(i.e. in terminography, a sort of hybrid of a good example and a definition, illustrating the
meaning characteristics of a term, but not being a formal definition.)
• Lexical-semantic relations
(e.g. synonyms, antonyms, hypernyms)
• Word senses
• Grammatical patterns
(e.g. word profiles, valency)
• Linguistic labels
(domain/ region/ dialect/ register/ style/ time/ slang and jargon/ attitude/ offensive terms)
General
• Web address: https://www.1ka.si/a/60608
• Questions: 134 (variables: 134)
• Pages: 18
• Completed: 45
• Partially completed: 6
• Total valid: 51
• All units in database: 196
• First entry: 13. 4. 2014, Last entry: 1. 5. 2015
Coverage
Positions
lexicographer
researcher
software developer
computational linguist
nlp researcher
terminologist
(associate) professor
project manager/director
phd student
Automatic Knowledge Acquisition
Q: Do you or your institution use a form of automatic
knowledge acquisition within a lexicographic project(s)?:
Answers Frequency
1 (YES) 36
2 (NO) 14
Valid 50
Lemma list
Frequency Information
Example sentences
Grammatical Patterns
MWEs
Form variation
Neologisms
Translation Equivalents
Lexical Semantic Relations
Word senses
Linguistic labels
Other
Definitions
Knowledge Rich Contexts
Other types of AKA
• Prioritizing lemmas (Denmark - Society for Danish Language and Literature)
• Word formation information (France - Université de Franche-Comté, Besançon)
• Semantic relations (France - Université de Franche-Comté, Besançon)
• Termhood probability (France - Université de Franche-Comté, Besançon)
• Selectional preferences (Hungary - Research Institute for Linguistics of the Hungarian
Academy of Sciences)
• Discourse markers (Denmark / France
Aarhus University, Business and Social Sciences, Department of Business Communication; Université de
Bourgogne, Maison des Sciences de l'Homme)
Meeting Programme:
http://www.elexicography.eu/working-groups/working-group-3/wg3-meetings/wg3-
herstmonceux-2015/
Q: Do you or your institution use automatic knowledge acquisition
for generating a candidate lemma list?
Answers Frequency
1 (YES) 23
2 (NO) 9
Valid 32
Q: Do you or your institution use automatic knowledge acquisition to
extract frequency information, e.g. overall lemma frequency
information?
Answers Frequency
1 (YES) 23
2 (NO) 9
Valid 32
Q: Do you or your institution use automatic knowledge acquisition to
extract information on form variation e.g. irregular morphology,
orthographic variants?
Answers Frequency
1 (YES) 11
2 (NO) 21
Valid 32
Q: Do you or your institution use automatic knowledge acquisition to
extract example sentences (cf. Vienna workshop)?
Answers Frequency
1 (YES) 18
2 (NO) 13
Valid 31
Q: Do you or your institution use automatic knowledge acquisition to
extract multiword expressions ( i.e. sequences of words with some
unpredictable properties such as "to count somebody in" or "to take a
haircut", ranging from collocations and phrasal verbs, (pragmatic) frozen
expressions (e.g. of course, good morning) to traditional idioms, proverbs
etc.)?
Answers Frequency
1 (YES) 14
2 (NO) 15
Valid 29
Q: Do you or your institution use automatic knowledge acquisition to
extract neologisms?
Answers Frequency
1 (YES) 10
2 (NO) 19
Valid 29
Q: Do you or your institution use automatic knowledge acquisition to
extract definitions?
Answers Frequency
1 (YES) 5
2 (NO) 23
Valid 28
Q: Do you or your institution use automatic knowledge acquisition to
extract translation equivalents?
Answers Frequency
1 (YES) 8
2 (NO) 19
Valid 27
Q: Do you or your institution use automatic knowledge acquisition to
extract knowledge rich contexts (i.e. a sort of hybrid of a good dictionary
example and a definition in the sense that is extracted from a corpus,
illustrates the meaning of a term, but it is not a formal definition) ?
Answers Frequency
1 (YES) 2
2 (NO) 26
Valid 28
Q: Do you or your institution use automatic knowledge acquisition to
extract lexical-semantic relations (e.g. synonyms, antonyms, hypernyms)
Answers Frequency
1 (YES) 7
2 (NO) 21
Valid 28
Q: Do you or your institution use automatic knowledge acquisition to
extract word senses?
Answers Frequency
1 (YES) 7
2 (NO) 21
Valid 28
Q: Do you or your institution use automatic knowledge acquisition to
extract grammatical patterns (e.g. word profiles, valency)
Answers Frequency
1 (YES) 16
2 (NO) 12
Valid 28
Q: Do you or your institution use automatic knowledge acquisition to
extract linguistic labels (e.g. domain/region/dialect/register/style/time/slang
and jargon/ attitude/ offensive terms)?
Answers Frequency
1 (YES) 7
2 (NO) 21
Valid 28
Q: Is the automatically acquired knowledge directly
integrated in the published dictionary without
human intervention?
Integrated without human intervention:
• Lemma lists
• Frequency information
• Example sentences
• Translation equivalents
• Lexical-semantic relations
Integrated with human intervention:
• Form variation
• MWE
• Neologisms
• Knowledge Rich Contexts
• Word senses
• Grammatical patterns
• Linguistic labels
Q: Is the automatically acquired knowledge directly
integrated in the published dictionary without
human intervention?
Q: How do the lexicographers judge the quality
of the automatically acquired knowledge?
• Lemma lists 4
• Frequency information 4
• Form variation 3
• Example sentences 4
• MWEs 3
• Neologisms 3
• Definitions 3
• Translation equivalents 3 - 4
• Knowledge Rich Contexts 3 – 4
• Lexical-semantic relations 3
• Word senses 3 – 4
• Grammatical patterns 4
• Linguistic labels 3
Wishes/ Comments
• automatic extraction of contrastive data annotated syntactic and
semantically
• There is a huge need for methods and tools for these tasks - if EU languages
shall be supported with high quality dictionaries published by EU institutions or
publishers - otherwise US IT giants will dominate the future. For publishers the
rights are important, as the model is changing from licensing/royalty models to
ownership models. But also for public institutions, that might publish for free,
the ownership is an important issue. There will be a certain degree of
skepticism about these methods and tools, and it will be hard to convince the
community about the quality and ROI.
• ... We use a LOT of knowledge acquisition, but it is not strictly applied to
lexicography yet.
Wishes/ Comments
• My work focuses on definitions. I have developed systems for extracting encyclopedic
definitions from encyclopedic text, also for extracting hypernyms from definitions, and for
learning taxonomies from free text using the previous systems. Part of my work also focuses
in harvesting semantic relations from the web, and disambiguating them where
possible. For my PhD work, I would like to have a system that given a set of documents which
belong to a certain domain, is able to identify candidate definitions and score them according
to their relevance to the document in which they are included, the corpus to which
document belongs, and finally the domain to which such corpus belong.
• Several researchers have applied types of automatic knowledge acquisition in their individual
research, e.g. to generate candidate lemma lists (Bratanić, Ostroški Anić and Radišić 2010,
Aviation English Terms and Collocations (An alphabetical checklist). Zagreb: Sveučilište u
Zagrebu, Fakultet prometnih znanosti.), to extract terms and collocations or
to extract frequency information(Stojanov and Vučić, 2012.
Korpusnojezikoslovna obradba tekstova Sportskih novosti. N-gramsko modeliranje
dohvaćanja podataka i vizualizacija. Filologija 59, 103-129).
Wishes/ Comments
• We use Sketch Engine functions (Word List, Collocates, Frequency) to analyze
concordances, e.g. in order to find form variations (irregular morphology,
orthographic variants). We also used function Word Sketch for extraction of
collocations.
• Domain sensitivity is a crucial lexicographical parameter and therefore,
automated processes can't be developed and exploited in the same way as in
lexicography for genreral purposes. The survey doesn't seem to include
innovative aspects on the analysis and representation of specialised knowledge
as such, so full automation has still a long way to go.
• Since we are working in the academic monolingual dictionary the level of the
corpus AAK is for us quite satisfying. In the case of such dictionary it is always
important to leave a space for a deeper semantic investigation.
• more for word sense disambiguation and definition
extraction
AKA per institution
Basque country - Elhuyar Foundation
• Lemma list
• Frequency information
• Example sentences (experimental level)
• Multiword expressions
• Neologisms
• Translation equivalents
• Grammatical Patterns (experimental level)
Elhuyar Hiztegiak (http://hiztegiak.elhuyar.org). Basque-Spanish dictionary ZTH-Dictionary of
Science and Technology (zthiztegia.elhuyar.org) Laneki Hiztegia (http://jakinbai.eu/hiztegia)
Automotive Dictionary (http://www.automotivedictionary.net/) (en, es and eu terms) Ihobe
Hiztegia environmental dictionary (intranet) CAF railway dictionary (intranet) on-going projects:
Osakidetza (Basque Health System); Social work (provincial governments of Araba, Bizkaia and
Gipuzkoa)
Belgium - KU Leuven
• Lemma List
• Frequency information
Corpus support to third party lexicographic publication on Belgian Dutch: "Typisch
Vlaams. 4000 Woorden en uitdrukkingen" [Typical Flemish. 4000 words and
expressions] http://www.davidsfonds.be/publisher/edition/detail.phtml?id=3540
• Translation equivalents
TermWise: Resources for Specialized Language Use
http://www.cs.kuleuven.be/groups/liir/projects.php?project=177
Bulgaria
Institute for Bulgarian Language
• Lemma List
• Frequency Information
• Neologisms
Dictionary of Bulgarian Language
• Lexical-semantic relations
Bulgarian WordNet; Dictionary of Bulgarian Language
Czech republic
Masaryk University, Faculty of Arts
• Lemma List
Low-cost ontology development, paper ->
http://is.muni.cz/repo/966117/gwc2012.pdf
• Word senses
Currently, in pilot - we are trying to create a new semantic network based on
combination of manually annotated data which are confirmed automatically by
corpus. This testing process can be also used for extending dictionary.
Czech republic
NLP Centre, Faculty of Informatics, Masaryk University
• Lemma List
Thesaurus for Geography Domain
• Frequency Information
DEB dictionary browser
• Example sentences
Czech Sign Language dictionary
• Grammatical patterns
Verbalex, verb valency lexicon
Czech republic- Lexical Computing
• Lemma List
• Frequency Information
• Form variation
• Example sentences
• Multiword expressions
• Neologisms
• DIACRAN
• Definitions
• Experimental
• Translation equivalents
• Lexical-semantic relations
• Distributional thesaurus in SketchEngine
• Word senses
• Clustering of word sketches
• Grammatical patterns
• Linguistic labels
(deliveries to publishers, IT companies)
Denmark
Society for Danish Language and Literature
• Other
We use an experimental mix of many of the methods mentioned above, to check
existing dictionary entries and to select and prioritize new ones. We do not use
these methods consequently and thoroughly, as suggested with this survey; this
does not fit with our dictionary-writing process.
Estonia
Institute of the Estonian Language
• Lemma List
• Frequency Information
• Example sentences
Estonian Collocations Dictionary
France
Université de Franche-Comté, Besançon
• Lemma List
• Frequency Information
• Form Variation
• Definitions
• Lexical-semantic relations
• Grammatical patterns
Sensunique project (already finished)
http://tesniere.univ-fcomte.fr/sensunique.html
http://www.station-sensunique.fr/
France
Université de Franche-Comté, Besançon
• Other
The Sensunique platform extracts or calculates from the corpora information about :
a) Functional Category of composed candidate terms (eg. Noun for stem cells); b)
Head and Expansion of composed candidate term (e.g. cells is a Head and stem is an
Expansion of stem cells) ; c) different associations between candidate terms (e.g.
inclusion : cells is totally included in stem cells; e.g. partial association : stem cells is
partially associated with dendritic cells) ; d) information relative to termhood
probability. The information extracted from corpora is enriched with the information
retrieved from the selected external resources (e.g. existing terminology databases),
such as definitions, variants, semantic classes.
Germany
Institut für Deutsche Sprache, Abteilung Lexik
• Lemma List
• Frequency Information
• Example sentences
elexiko http://www.ids-mannheim.de/lexik/elexiko.html
• Multiword expressions
Usuelle Wortverbindungen http://www.ids-mannheim.de/lexik/uwv.html
http://wvonline.ids-mannheim.de/
Greece
Institute for Language and Speech Processing, Athena RIC
• Lemma List
• Frequency Information
• Multiword expressions
• Polytropon Project: Conceptual Dictionary of Modern Greek. (Under development) Fotopoulou, A. and
Giouli, V. From "Ekfrasis" to Polytropon. Towards a dictionary of the Modern Greek Language
Conceptually organised. Paper accepted at the International Conference in Greek Linguistics (in Greek).
The Greek High School Dictionary Giouli, V., Gavrilidou, M., Lambropoulou, P. 2008. The Greek High
School Dictionary: Description and issues. In Proceedings of the XIII Euralex International Congress
(EURALEX 2008). July 2008, Barcelona, Spain. eMiLang Project Vakalopoulou, A., Giouli, V., Giagkou, M.,
and Efthimiou, E. 2011. Online Dictionaries for immigrants in Greece: Overcoming the Communication
Barriers. In Proceedings of the 2nd Conference “Electronic Lexicography in the 21st century: new
Applications for New users” (eLEX2011), Bled, Slovenia, 10-12 November 2011.
• Translation equivalents
INTERA Project http://www.elda.fr/en/projects/archived-projects/intera/ Gavrilidou, M., Labropoulou, P.,
Desipri, E., Giouli, V., Antonopoulos, V. & Piperidis, S. (2004). Building parallel corpora for {eContent}
professionals. In COLING 2004. Geneva.
Hungary
Research Institute for Linguistics of the Hungarian
Academy of Sciences
• Lemma List
• Frequency Information
• Translation equivalents
• EFNILEX 2008--2012 http://www.nytud.hu/depts/corpus/efnilex.html 2014--2015
http://corpus.nytud.hu/efnilex-vect/
• Lexicographers do not yet directly use the results, which are at the research stage yet.
• Multiword expressions
• Grammatical patterns
• Sass, Bálint and Pajzs, Júlia. FDVC -- Creating a Corpus-driven Frequency Dictionary of Verb Phrase
Constructions for Hungarian. In: Sylviane Granger, Magali Paquot (Eds.) eLexicography in the 21st
century: New challenges, new applications. Proceedings of eLex 2009, Louvain-la-Neuve, 22-24 October
2009. Cahiers du CENTAL 7. Presses universitaires de Louvain, 2010., p. 263-272
• Lexicographers manually added corpus based examples to the verb phrase constructions.
• Other
• Extending Hungarian WordNet With Selectional Preference Relations
Italy
European Academy of Bolzano/Bozen (EURAC)
• Word senses
• For the ELDIT project (www.eurac.edu/eldit) in an experimental study
Italy
University of Bologna, University of Pisa
• Multiword expressions
• Grammatical patterns
• CombiNet - Word Combinations in Italian (http://combinet.humnet.unipi.it/) We
use the broad term "Word combinations" because we target both MWEs (e.g.
phrasal lexemes, idioms, collocations) and more abstract combinatorial
information (e.g. argument structure patterns, subcategorization frames, and
selectional preferences).
Netherlands
Instituut voor Nederlandse Lexicologie
• Lemma List
• Frequency Information
• Example sentences (work in progress)
• Neologisms (work in progress)
• Grammatical patterns (work in progress)
• Linguistic labels (work in progress)
Algemeen Nederlands Woordenboek (ANW) http://anw.inl.nl/show?page=help#overhetANW
Schoonheim, Tanneke en Rob Tempelaars (2014), ‘Algemeen Nederlands Woordenboek (ANW), A
Dictionary of Contemporary Dutch’. In: www.elexicography.eu/wp-content/uploads/2014/11/Bled-
ANW-2014.pdf Schoonheim, Tanneke and Rob Tempelaars (2010), 'Dutch Lexicography in Progress,
The Algemeen Nederlands Woordenboek (ANW)'. In: Anne Dykstra and Tanneke Schoonheim (eds.),
Proceedings of the XIV Euralex International Congress. Ljouwert, Fryske Akademy/Afûk, 179
(abstract), de volledige tekst op de bijgevoegde cd-rom.
http://www.euralex.org/elx_proceedings/Euralex2010/059_Euralex_2010_3_SCHOONHEIM
TEMPELAARS_Dutch Lexicography in Progress_the Algemeen Nederlands Woordenboek_ANW.pdf
Poland
Institute of the Polish Language PAS
• AKA types not further specified in survey
Poland
Institute of the Polish Language at the Polish Academy
of Sciences (IJP PAN)
• Frequency Information
• Form variation
• Example sentences
• Neologisms
• Word senses
• Grammatical patterns
• Linguistic labels
• Great Dictionary of Polish, www.wsjp.pl
• Multiword expressions
• Great Dictionary of Polish, www.wsjp.pl (idioms, proverbs, scientific multiword terms,
other discontinuous textual units - so called functional units).
Portugal
Centro de Linguística da Universidade de Lisboa
• Lemma List
• Reference Corpus of Contemporary Portuguese http://www.clul.ul.pt/en/resources/183-
reference-corpus-of-contemporary-portuguese-crpc
• Frequency Information
• Multifunctional computational lexicon of contemporary portuguese
http://www.clul.ul.pt/en/research-teams/194-multifunctional-computational-lexicon-of-
contemporary-portuguese
• Example sentences
• Dicionário da Academia das Ciências de Lisboa
• Multiword expressions
Word combinations in the Portuguese language http://www.clul.ul.pt/en/research-
teams/187-combina-pt-word-combinations-in-portuguese-language
• Lexical-semantic relations
Portugal
Centro de Linguística da Universidade Nova de Lisboa
Faculdade de Ciências Sociais e Húmanas
• Lemma List
• Frequency Information
• Form variation
• Example sentences
• Multiword expressions
• Neologisms
• Translation equivalents
• Knowledge rich contexts
• Lexical-semantic relations
• Word senses
• Grammatical patterns
• For LSP
Slovakia
Ľ. Štúr Institute of Linguistics, Slovak Academy of
Sciences
• Lemma List
• Form variation
• Handbook of Slovak Nouns http://slovniky.korpus.sk/?d=noundb
• Example sentences
• Handbook of Slovak Nouns http://slovniky.korpus.sk/?d=noundb Parallel Corpora Phrases (en-sk, cs-sk,
bg-sk) http://slovniky.korpus.sk/
• Frequency Information
• Handbook of Slovak Nouns http://slovniky.korpus.sk/?d=noundb Dictionary of Contemporary Slovak
http://www.juls.savba.sk/oddelenie_sucas_lexikografie_vyskumna_cinnost.html Slovak-Czech
Dictionary http://gacr311.ujc.cas.cz/web/
• Multiword expressions
• Dictionary of Slovak Collocations http://vronk.net/wicol
• Translation equivalents
• phrases from parallel corpora (en-sk,cs-sk,bg-sk) http://slovniky.korpus.sk/
• Grammatical patterns
• Slovak Valency Dictionary (internal database, no URL yet)
Slovenia
University of Ljubljana, Faculty of Arts; Trojina,
Institute for Applied Slovene Studies; Jožef Stefan
Institute
• Lemma List
• Frequency Information
• Multiword expressions
• Grammatical patterns
• Linguistic labels
• Communication in Slovene: http://eng.slovenscina.eu Slovene Lexical Database:
http://eng.slovenscina.eu/spletni-slovar/leksikalna-baza Sloleks - morphological lexicon:
http://eng.slovenscina.eu/sloleks/opis Termis: http://www.termis.fdv.uni-lj.si/index-
en.html
• Form Variation
• Communication in Slovene: http://eng.slovenscina.eu Ortography Guide:
http://eng.slovenscina.eu/portali/slogovni-prirocnik
Spain
Universidade da Coruña and Real Academia Galega
• Example sentences
• Neologisms
• Definitions
• Lexical-semantic relations
• Word senses
• Linguistic labels
Spanish-Galician Dictionary of the Royal Galician Academy
No publications on the automatic acquisition of knowledge
Spain
University Institute for Applied Linguistics
(Pompeu Fabra University)
• Lemma list
• Frequency information
• Example sentences
• Grammatical patterns
Terminus 2.0, a web application for corpus and terminology
managment http://terminus.iula.upf.edu
• Neologisms
Buscaneo http://obneo.iula.upf.edu/buscaneo/
Sweden
University of Gothenburg, Dpt. of Swedish, Språkbanken
• Lemma lists 1. Kelly (http://spraakbanken.gu.se/eng/kelly) 2. Academic Wordlist (AO,
http://spraakbanken.gu.se/eng/forskning/akademiska-ordlistor) 3. SVALex (ongoing, target
Swedish as a second language lexicon);
• Form variation 1) Diabase (http://spraakbanken.gu.se/eng/forskning/diabase); 2) Mathir
(http://spraakbanken.gu.se/eng/mathir);
• MWEs Constructicon (http://spraakbanken.gu.se/eng/sweccn);
• Example sentences (HitEx (http://spraakbanken.gu.se/larka/larka_hitex_index.html)
• Definitions (Semantic Interoperability and Data Mining in Biomedicine
(http://cordis.europa.eu/project/rcn/71155_en.html)
• Lexical-semantic relations (SweFN++ (http://spraakbanken.gu.se/eng/swefn)
• Word senses (Distributional Methods to Represent the Meaning of Frames and Constructions
(http://spraakbanken.gu.se/eng/corpsem)
• Grammatical patterns (Culturomics (http://spraakbanken.gu.se/eng/culturomics)
• Linguistic labels (1. A Swedish vocation list 2. ongoing PhD thesis on automatic readability
classification of texts and sentences 3. Semantics in Storytelling in Swedish Fiction (list of
relations, named entity recognition, aliases)
Switzerland
École Polytechnique Fédérale de Lausanne
• Example sentences:
Kamusi Global Online Living Dictionary http://kamusi.org
Denmark / France
Aarhus University, Business and Social Sciences,
Department of Business Communication;
Université de Bourgogne, Maison des Sciences de
l'Homme
• Lemma list
• Form variation
• Example sentences
• Multiword expressions
• Neologisms
• Knowledge rich contexts
• Oenolex, wine dictionary
• Other
• Discourse markers acquisition and discourse interaction markers

More Related Content

Similar to ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx

The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Alannah Fitzgerald
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineeringbutest
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologieseswcsummerschool
 
NON-TECHNICAL COMPUTER THESAURUS VERSUS SPECIALIZED COMPUTER THESAURUS
NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUSNON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS
NON-TECHNICAL COMPUTER THESAURUS VERSUS SPECIALIZED COMPUTER THESAURUSSabadel
 
Definiendo el enfoque lfe
Definiendo el enfoque lfeDefiniendo el enfoque lfe
Definiendo el enfoque lfeNatalia Rojo
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfJasmine Dixon
 
Tracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language DevelopmentTracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language DevelopmentCALPER
 
Word sense disambiguation a survey
Word sense disambiguation  a surveyWord sense disambiguation  a survey
Word sense disambiguation a surveyijctcm
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
CH 3_The Linguistics of Second Language Acquisition.pptx
CH 3_The Linguistics of Second Language Acquisition.pptxCH 3_The Linguistics of Second Language Acquisition.pptx
CH 3_The Linguistics of Second Language Acquisition.pptxVATHVARY
 

Similar to ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx (20)

The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
RusLTC at TSD-2014 (Brno)
RusLTC at TSD-2014 (Brno)RusLTC at TSD-2014 (Brno)
RusLTC at TSD-2014 (Brno)
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
NON-TECHNICAL COMPUTER THESAURUS VERSUS SPECIALIZED COMPUTER THESAURUS
NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUSNON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS
NON-TECHNICAL COMPUTER THESAURUS VERSUS SPECIALIZED COMPUTER THESAURUS
 
Definiendo el enfoque lfe
Definiendo el enfoque lfeDefiniendo el enfoque lfe
Definiendo el enfoque lfe
 
A statistical approach to term extraction.pdf
A statistical approach to term extraction.pdfA statistical approach to term extraction.pdf
A statistical approach to term extraction.pdf
 
Ontology
OntologyOntology
Ontology
 
Tracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language DevelopmentTracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language Development
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
 
Word sense disambiguation a survey
Word sense disambiguation  a surveyWord sense disambiguation  a survey
Word sense disambiguation a survey
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
CH 3_The Linguistics of Second Language Acquisition.pptx
CH 3_The Linguistics of Second Language Acquisition.pptxCH 3_The Linguistics of Second Language Acquisition.pptx
CH 3_The Linguistics of Second Language Acquisition.pptx
 

More from SyedNadeemAbbas6

Language Programs and Policies in Multilingual Societies.pptx
Language Programs and Policies in Multilingual Societies.pptxLanguage Programs and Policies in Multilingual Societies.pptx
Language Programs and Policies in Multilingual Societies.pptxSyedNadeemAbbas6
 
1606983314-connected-speech.ppt
1606983314-connected-speech.ppt1606983314-connected-speech.ppt
1606983314-connected-speech.pptSyedNadeemAbbas6
 
Teaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptxTeaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptxSyedNadeemAbbas6
 
Teaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptxTeaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptxSyedNadeemAbbas6
 
vdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.pptvdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.pptSyedNadeemAbbas6
 
1. Phonetics & phonology.pptx
1. Phonetics & phonology.pptx1. Phonetics & phonology.pptx
1. Phonetics & phonology.pptxSyedNadeemAbbas6
 
dokumen.tips_vowels-diphthongs-56a6087239fdc.ppt
dokumen.tips_vowels-diphthongs-56a6087239fdc.pptdokumen.tips_vowels-diphthongs-56a6087239fdc.ppt
dokumen.tips_vowels-diphthongs-56a6087239fdc.pptSyedNadeemAbbas6
 
typesofsyllabusdesign-190728190707.pptx
typesofsyllabusdesign-190728190707.pptxtypesofsyllabusdesign-190728190707.pptx
typesofsyllabusdesign-190728190707.pptxSyedNadeemAbbas6
 
jazz_presentation_22022022.pptx
jazz_presentation_22022022.pptxjazz_presentation_22022022.pptx
jazz_presentation_22022022.pptxSyedNadeemAbbas6
 
Sentence Structure Types.pptx
Sentence Structure Types.pptxSentence Structure Types.pptx
Sentence Structure Types.pptxSyedNadeemAbbas6
 
Chapter 6 Slides-Language & cultural identity (1).pptx
Chapter 6 Slides-Language & cultural identity (1).pptxChapter 6 Slides-Language & cultural identity (1).pptx
Chapter 6 Slides-Language & cultural identity (1).pptxSyedNadeemAbbas6
 

More from SyedNadeemAbbas6 (20)

Language Programs and Policies in Multilingual Societies.pptx
Language Programs and Policies in Multilingual Societies.pptxLanguage Programs and Policies in Multilingual Societies.pptx
Language Programs and Policies in Multilingual Societies.pptx
 
1606983314-connected-speech.ppt
1606983314-connected-speech.ppt1606983314-connected-speech.ppt
1606983314-connected-speech.ppt
 
Teaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptxTeaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptx
 
13521860 (2).ppt
13521860 (2).ppt13521860 (2).ppt
13521860 (2).ppt
 
Hafta7 (1).pptx
Hafta7 (1).pptxHafta7 (1).pptx
Hafta7 (1).pptx
 
Teaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptxTeaching Receptive and Productive Skills.pptx
Teaching Receptive and Productive Skills.pptx
 
vdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.pptvdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.ppt
 
310-13-student.ppt
310-13-student.ppt310-13-student.ppt
310-13-student.ppt
 
Stress.ppt
Stress.pptStress.ppt
Stress.ppt
 
1. Phonetics & phonology.pptx
1. Phonetics & phonology.pptx1. Phonetics & phonology.pptx
1. Phonetics & phonology.pptx
 
dokumen.tips_vowels-diphthongs-56a6087239fdc.ppt
dokumen.tips_vowels-diphthongs-56a6087239fdc.pptdokumen.tips_vowels-diphthongs-56a6087239fdc.ppt
dokumen.tips_vowels-diphthongs-56a6087239fdc.ppt
 
typesofsyllabusdesign-190728190707.pptx
typesofsyllabusdesign-190728190707.pptxtypesofsyllabusdesign-190728190707.pptx
typesofsyllabusdesign-190728190707.pptx
 
cAUSES OF ERRORS.pptx
cAUSES OF ERRORS.pptxcAUSES OF ERRORS.pptx
cAUSES OF ERRORS.pptx
 
Hafta12 (1).pptx
Hafta12 (1).pptxHafta12 (1).pptx
Hafta12 (1).pptx
 
Women Welfare2.pptx
Women Welfare2.pptxWomen Welfare2.pptx
Women Welfare2.pptx
 
jazz_presentation_22022022.pptx
jazz_presentation_22022022.pptxjazz_presentation_22022022.pptx
jazz_presentation_22022022.pptx
 
Sentence Structure Types.pptx
Sentence Structure Types.pptxSentence Structure Types.pptx
Sentence Structure Types.pptx
 
Word Formation.ppt
Word Formation.pptWord Formation.ppt
Word Formation.ppt
 
Chapter 6 Slides-Language & cultural identity (1).pptx
Chapter 6 Slides-Language & cultural identity (1).pptxChapter 6 Slides-Language & cultural identity (1).pptx
Chapter 6 Slides-Language & cultural identity (1).pptx
 
seltpresentation.ppt
seltpresentation.pptseltpresentation.ppt
seltpresentation.ppt
 

Recently uploaded

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 

Recently uploaded (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 

ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx

  • 1. Survey – WG3 ENeL Automatic Knowledge Acquisition for Lexicography Carole Tiberius, Institute for Dutch Lexicology, Leiden, the Netherlands Kris Heylen, University of Leuven, Belgium Simon Krek, „Jožef Stefan“ Institute, Ljubljana, Slovenia
  • 2. Purpose of the survey Create an inventory of different types of automatic knowledge acquisition which are currently used within the framework of lexicographical projects
  • 3. Automatic Acquisition of Knowledge Knowledge (data) which • is automatically obtained from corpora of authentic language use (both synchronic and diachronic); • forms either the input for lexicographers (who further inspect and edit the data) or is included as is in the published dictionary (possibly marked as being knowledge which has been automatically derived from corpus data).
  • 4. Types of Automatically Acquired Knowledge • (Candidate) Lemma list • Overall Lemma Frequency information • Form variation (e.g. irregular morphology, orthographic variants) • Example sentences (cf. Vienna COST workshop) • Multiword expressions (i.e. sequences of words with some unpredictable properties such as "to count somebody in" or "to take a haircut", ranging from collocations and phrasal verbs, (pragmatic) frozen expressions (e.g. of course, good morning) to traditional idioms, proverbs etc.)
  • 5. Types of Automatically Acquired Knowledge … • Neologisms • Definitions • Translation Equivalents • Knowledge Rich Contexts (i.e. in terminography, a sort of hybrid of a good example and a definition, illustrating the meaning characteristics of a term, but not being a formal definition.) • Lexical-semantic relations (e.g. synonyms, antonyms, hypernyms) • Word senses • Grammatical patterns (e.g. word profiles, valency) • Linguistic labels (domain/ region/ dialect/ register/ style/ time/ slang and jargon/ attitude/ offensive terms)
  • 6. General • Web address: https://www.1ka.si/a/60608 • Questions: 134 (variables: 134) • Pages: 18 • Completed: 45 • Partially completed: 6 • Total valid: 51 • All units in database: 196 • First entry: 13. 4. 2014, Last entry: 1. 5. 2015
  • 8. Positions lexicographer researcher software developer computational linguist nlp researcher terminologist (associate) professor project manager/director phd student
  • 9. Automatic Knowledge Acquisition Q: Do you or your institution use a form of automatic knowledge acquisition within a lexicographic project(s)?: Answers Frequency 1 (YES) 36 2 (NO) 14 Valid 50
  • 10. Lemma list Frequency Information Example sentences Grammatical Patterns MWEs Form variation Neologisms Translation Equivalents Lexical Semantic Relations Word senses Linguistic labels Other Definitions Knowledge Rich Contexts
  • 11. Other types of AKA • Prioritizing lemmas (Denmark - Society for Danish Language and Literature) • Word formation information (France - Université de Franche-Comté, Besançon) • Semantic relations (France - Université de Franche-Comté, Besançon) • Termhood probability (France - Université de Franche-Comté, Besançon) • Selectional preferences (Hungary - Research Institute for Linguistics of the Hungarian Academy of Sciences) • Discourse markers (Denmark / France Aarhus University, Business and Social Sciences, Department of Business Communication; Université de Bourgogne, Maison des Sciences de l'Homme)
  • 13. Q: Do you or your institution use automatic knowledge acquisition for generating a candidate lemma list? Answers Frequency 1 (YES) 23 2 (NO) 9 Valid 32
  • 14. Q: Do you or your institution use automatic knowledge acquisition to extract frequency information, e.g. overall lemma frequency information? Answers Frequency 1 (YES) 23 2 (NO) 9 Valid 32
  • 15. Q: Do you or your institution use automatic knowledge acquisition to extract information on form variation e.g. irregular morphology, orthographic variants? Answers Frequency 1 (YES) 11 2 (NO) 21 Valid 32
  • 16. Q: Do you or your institution use automatic knowledge acquisition to extract example sentences (cf. Vienna workshop)? Answers Frequency 1 (YES) 18 2 (NO) 13 Valid 31
  • 17. Q: Do you or your institution use automatic knowledge acquisition to extract multiword expressions ( i.e. sequences of words with some unpredictable properties such as "to count somebody in" or "to take a haircut", ranging from collocations and phrasal verbs, (pragmatic) frozen expressions (e.g. of course, good morning) to traditional idioms, proverbs etc.)? Answers Frequency 1 (YES) 14 2 (NO) 15 Valid 29
  • 18. Q: Do you or your institution use automatic knowledge acquisition to extract neologisms? Answers Frequency 1 (YES) 10 2 (NO) 19 Valid 29
  • 19. Q: Do you or your institution use automatic knowledge acquisition to extract definitions? Answers Frequency 1 (YES) 5 2 (NO) 23 Valid 28
  • 20. Q: Do you or your institution use automatic knowledge acquisition to extract translation equivalents? Answers Frequency 1 (YES) 8 2 (NO) 19 Valid 27
  • 21. Q: Do you or your institution use automatic knowledge acquisition to extract knowledge rich contexts (i.e. a sort of hybrid of a good dictionary example and a definition in the sense that is extracted from a corpus, illustrates the meaning of a term, but it is not a formal definition) ? Answers Frequency 1 (YES) 2 2 (NO) 26 Valid 28
  • 22. Q: Do you or your institution use automatic knowledge acquisition to extract lexical-semantic relations (e.g. synonyms, antonyms, hypernyms) Answers Frequency 1 (YES) 7 2 (NO) 21 Valid 28
  • 23. Q: Do you or your institution use automatic knowledge acquisition to extract word senses? Answers Frequency 1 (YES) 7 2 (NO) 21 Valid 28
  • 24. Q: Do you or your institution use automatic knowledge acquisition to extract grammatical patterns (e.g. word profiles, valency) Answers Frequency 1 (YES) 16 2 (NO) 12 Valid 28
  • 25. Q: Do you or your institution use automatic knowledge acquisition to extract linguistic labels (e.g. domain/region/dialect/register/style/time/slang and jargon/ attitude/ offensive terms)? Answers Frequency 1 (YES) 7 2 (NO) 21 Valid 28
  • 26. Q: Is the automatically acquired knowledge directly integrated in the published dictionary without human intervention? Integrated without human intervention: • Lemma lists • Frequency information • Example sentences • Translation equivalents • Lexical-semantic relations
  • 27. Integrated with human intervention: • Form variation • MWE • Neologisms • Knowledge Rich Contexts • Word senses • Grammatical patterns • Linguistic labels Q: Is the automatically acquired knowledge directly integrated in the published dictionary without human intervention?
  • 28. Q: How do the lexicographers judge the quality of the automatically acquired knowledge? • Lemma lists 4 • Frequency information 4 • Form variation 3 • Example sentences 4 • MWEs 3 • Neologisms 3 • Definitions 3 • Translation equivalents 3 - 4 • Knowledge Rich Contexts 3 – 4 • Lexical-semantic relations 3 • Word senses 3 – 4 • Grammatical patterns 4 • Linguistic labels 3
  • 29. Wishes/ Comments • automatic extraction of contrastive data annotated syntactic and semantically • There is a huge need for methods and tools for these tasks - if EU languages shall be supported with high quality dictionaries published by EU institutions or publishers - otherwise US IT giants will dominate the future. For publishers the rights are important, as the model is changing from licensing/royalty models to ownership models. But also for public institutions, that might publish for free, the ownership is an important issue. There will be a certain degree of skepticism about these methods and tools, and it will be hard to convince the community about the quality and ROI. • ... We use a LOT of knowledge acquisition, but it is not strictly applied to lexicography yet.
  • 30. Wishes/ Comments • My work focuses on definitions. I have developed systems for extracting encyclopedic definitions from encyclopedic text, also for extracting hypernyms from definitions, and for learning taxonomies from free text using the previous systems. Part of my work also focuses in harvesting semantic relations from the web, and disambiguating them where possible. For my PhD work, I would like to have a system that given a set of documents which belong to a certain domain, is able to identify candidate definitions and score them according to their relevance to the document in which they are included, the corpus to which document belongs, and finally the domain to which such corpus belong. • Several researchers have applied types of automatic knowledge acquisition in their individual research, e.g. to generate candidate lemma lists (Bratanić, Ostroški Anić and Radišić 2010, Aviation English Terms and Collocations (An alphabetical checklist). Zagreb: Sveučilište u Zagrebu, Fakultet prometnih znanosti.), to extract terms and collocations or to extract frequency information(Stojanov and Vučić, 2012. Korpusnojezikoslovna obradba tekstova Sportskih novosti. N-gramsko modeliranje dohvaćanja podataka i vizualizacija. Filologija 59, 103-129).
  • 31. Wishes/ Comments • We use Sketch Engine functions (Word List, Collocates, Frequency) to analyze concordances, e.g. in order to find form variations (irregular morphology, orthographic variants). We also used function Word Sketch for extraction of collocations. • Domain sensitivity is a crucial lexicographical parameter and therefore, automated processes can't be developed and exploited in the same way as in lexicography for genreral purposes. The survey doesn't seem to include innovative aspects on the analysis and representation of specialised knowledge as such, so full automation has still a long way to go. • Since we are working in the academic monolingual dictionary the level of the corpus AAK is for us quite satisfying. In the case of such dictionary it is always important to leave a space for a deeper semantic investigation. • more for word sense disambiguation and definition extraction
  • 33. Basque country - Elhuyar Foundation • Lemma list • Frequency information • Example sentences (experimental level) • Multiword expressions • Neologisms • Translation equivalents • Grammatical Patterns (experimental level) Elhuyar Hiztegiak (http://hiztegiak.elhuyar.org). Basque-Spanish dictionary ZTH-Dictionary of Science and Technology (zthiztegia.elhuyar.org) Laneki Hiztegia (http://jakinbai.eu/hiztegia) Automotive Dictionary (http://www.automotivedictionary.net/) (en, es and eu terms) Ihobe Hiztegia environmental dictionary (intranet) CAF railway dictionary (intranet) on-going projects: Osakidetza (Basque Health System); Social work (provincial governments of Araba, Bizkaia and Gipuzkoa)
  • 34. Belgium - KU Leuven • Lemma List • Frequency information Corpus support to third party lexicographic publication on Belgian Dutch: "Typisch Vlaams. 4000 Woorden en uitdrukkingen" [Typical Flemish. 4000 words and expressions] http://www.davidsfonds.be/publisher/edition/detail.phtml?id=3540 • Translation equivalents TermWise: Resources for Specialized Language Use http://www.cs.kuleuven.be/groups/liir/projects.php?project=177
  • 35. Bulgaria Institute for Bulgarian Language • Lemma List • Frequency Information • Neologisms Dictionary of Bulgarian Language • Lexical-semantic relations Bulgarian WordNet; Dictionary of Bulgarian Language
  • 36. Czech republic Masaryk University, Faculty of Arts • Lemma List Low-cost ontology development, paper -> http://is.muni.cz/repo/966117/gwc2012.pdf • Word senses Currently, in pilot - we are trying to create a new semantic network based on combination of manually annotated data which are confirmed automatically by corpus. This testing process can be also used for extending dictionary.
  • 37. Czech republic NLP Centre, Faculty of Informatics, Masaryk University • Lemma List Thesaurus for Geography Domain • Frequency Information DEB dictionary browser • Example sentences Czech Sign Language dictionary • Grammatical patterns Verbalex, verb valency lexicon
  • 38. Czech republic- Lexical Computing • Lemma List • Frequency Information • Form variation • Example sentences • Multiword expressions • Neologisms • DIACRAN • Definitions • Experimental • Translation equivalents • Lexical-semantic relations • Distributional thesaurus in SketchEngine • Word senses • Clustering of word sketches • Grammatical patterns • Linguistic labels (deliveries to publishers, IT companies)
  • 39. Denmark Society for Danish Language and Literature • Other We use an experimental mix of many of the methods mentioned above, to check existing dictionary entries and to select and prioritize new ones. We do not use these methods consequently and thoroughly, as suggested with this survey; this does not fit with our dictionary-writing process.
  • 40. Estonia Institute of the Estonian Language • Lemma List • Frequency Information • Example sentences Estonian Collocations Dictionary
  • 41. France Université de Franche-Comté, Besançon • Lemma List • Frequency Information • Form Variation • Definitions • Lexical-semantic relations • Grammatical patterns Sensunique project (already finished) http://tesniere.univ-fcomte.fr/sensunique.html http://www.station-sensunique.fr/
  • 42. France Université de Franche-Comté, Besançon • Other The Sensunique platform extracts or calculates from the corpora information about : a) Functional Category of composed candidate terms (eg. Noun for stem cells); b) Head and Expansion of composed candidate term (e.g. cells is a Head and stem is an Expansion of stem cells) ; c) different associations between candidate terms (e.g. inclusion : cells is totally included in stem cells; e.g. partial association : stem cells is partially associated with dendritic cells) ; d) information relative to termhood probability. The information extracted from corpora is enriched with the information retrieved from the selected external resources (e.g. existing terminology databases), such as definitions, variants, semantic classes.
  • 43. Germany Institut für Deutsche Sprache, Abteilung Lexik • Lemma List • Frequency Information • Example sentences elexiko http://www.ids-mannheim.de/lexik/elexiko.html • Multiword expressions Usuelle Wortverbindungen http://www.ids-mannheim.de/lexik/uwv.html http://wvonline.ids-mannheim.de/
  • 44. Greece Institute for Language and Speech Processing, Athena RIC • Lemma List • Frequency Information • Multiword expressions • Polytropon Project: Conceptual Dictionary of Modern Greek. (Under development) Fotopoulou, A. and Giouli, V. From "Ekfrasis" to Polytropon. Towards a dictionary of the Modern Greek Language Conceptually organised. Paper accepted at the International Conference in Greek Linguistics (in Greek). The Greek High School Dictionary Giouli, V., Gavrilidou, M., Lambropoulou, P. 2008. The Greek High School Dictionary: Description and issues. In Proceedings of the XIII Euralex International Congress (EURALEX 2008). July 2008, Barcelona, Spain. eMiLang Project Vakalopoulou, A., Giouli, V., Giagkou, M., and Efthimiou, E. 2011. Online Dictionaries for immigrants in Greece: Overcoming the Communication Barriers. In Proceedings of the 2nd Conference “Electronic Lexicography in the 21st century: new Applications for New users” (eLEX2011), Bled, Slovenia, 10-12 November 2011. • Translation equivalents INTERA Project http://www.elda.fr/en/projects/archived-projects/intera/ Gavrilidou, M., Labropoulou, P., Desipri, E., Giouli, V., Antonopoulos, V. & Piperidis, S. (2004). Building parallel corpora for {eContent} professionals. In COLING 2004. Geneva.
  • 45. Hungary Research Institute for Linguistics of the Hungarian Academy of Sciences • Lemma List • Frequency Information • Translation equivalents • EFNILEX 2008--2012 http://www.nytud.hu/depts/corpus/efnilex.html 2014--2015 http://corpus.nytud.hu/efnilex-vect/ • Lexicographers do not yet directly use the results, which are at the research stage yet. • Multiword expressions • Grammatical patterns • Sass, Bálint and Pajzs, Júlia. FDVC -- Creating a Corpus-driven Frequency Dictionary of Verb Phrase Constructions for Hungarian. In: Sylviane Granger, Magali Paquot (Eds.) eLexicography in the 21st century: New challenges, new applications. Proceedings of eLex 2009, Louvain-la-Neuve, 22-24 October 2009. Cahiers du CENTAL 7. Presses universitaires de Louvain, 2010., p. 263-272 • Lexicographers manually added corpus based examples to the verb phrase constructions. • Other • Extending Hungarian WordNet With Selectional Preference Relations
  • 46. Italy European Academy of Bolzano/Bozen (EURAC) • Word senses • For the ELDIT project (www.eurac.edu/eldit) in an experimental study
  • 47. Italy University of Bologna, University of Pisa • Multiword expressions • Grammatical patterns • CombiNet - Word Combinations in Italian (http://combinet.humnet.unipi.it/) We use the broad term "Word combinations" because we target both MWEs (e.g. phrasal lexemes, idioms, collocations) and more abstract combinatorial information (e.g. argument structure patterns, subcategorization frames, and selectional preferences).
  • 48. Netherlands Instituut voor Nederlandse Lexicologie • Lemma List • Frequency Information • Example sentences (work in progress) • Neologisms (work in progress) • Grammatical patterns (work in progress) • Linguistic labels (work in progress) Algemeen Nederlands Woordenboek (ANW) http://anw.inl.nl/show?page=help#overhetANW Schoonheim, Tanneke en Rob Tempelaars (2014), ‘Algemeen Nederlands Woordenboek (ANW), A Dictionary of Contemporary Dutch’. In: www.elexicography.eu/wp-content/uploads/2014/11/Bled- ANW-2014.pdf Schoonheim, Tanneke and Rob Tempelaars (2010), 'Dutch Lexicography in Progress, The Algemeen Nederlands Woordenboek (ANW)'. In: Anne Dykstra and Tanneke Schoonheim (eds.), Proceedings of the XIV Euralex International Congress. Ljouwert, Fryske Akademy/Afûk, 179 (abstract), de volledige tekst op de bijgevoegde cd-rom. http://www.euralex.org/elx_proceedings/Euralex2010/059_Euralex_2010_3_SCHOONHEIM TEMPELAARS_Dutch Lexicography in Progress_the Algemeen Nederlands Woordenboek_ANW.pdf
  • 49. Poland Institute of the Polish Language PAS • AKA types not further specified in survey
  • 50. Poland Institute of the Polish Language at the Polish Academy of Sciences (IJP PAN) • Frequency Information • Form variation • Example sentences • Neologisms • Word senses • Grammatical patterns • Linguistic labels • Great Dictionary of Polish, www.wsjp.pl • Multiword expressions • Great Dictionary of Polish, www.wsjp.pl (idioms, proverbs, scientific multiword terms, other discontinuous textual units - so called functional units).
  • 51. Portugal Centro de Linguística da Universidade de Lisboa • Lemma List • Reference Corpus of Contemporary Portuguese http://www.clul.ul.pt/en/resources/183- reference-corpus-of-contemporary-portuguese-crpc • Frequency Information • Multifunctional computational lexicon of contemporary portuguese http://www.clul.ul.pt/en/research-teams/194-multifunctional-computational-lexicon-of- contemporary-portuguese • Example sentences • Dicionário da Academia das Ciências de Lisboa • Multiword expressions Word combinations in the Portuguese language http://www.clul.ul.pt/en/research- teams/187-combina-pt-word-combinations-in-portuguese-language • Lexical-semantic relations
  • 52. Portugal Centro de Linguística da Universidade Nova de Lisboa Faculdade de Ciências Sociais e Húmanas • Lemma List • Frequency Information • Form variation • Example sentences • Multiword expressions • Neologisms • Translation equivalents • Knowledge rich contexts • Lexical-semantic relations • Word senses • Grammatical patterns • For LSP
  • 53. Slovakia Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences • Lemma List • Form variation • Handbook of Slovak Nouns http://slovniky.korpus.sk/?d=noundb • Example sentences • Handbook of Slovak Nouns http://slovniky.korpus.sk/?d=noundb Parallel Corpora Phrases (en-sk, cs-sk, bg-sk) http://slovniky.korpus.sk/ • Frequency Information • Handbook of Slovak Nouns http://slovniky.korpus.sk/?d=noundb Dictionary of Contemporary Slovak http://www.juls.savba.sk/oddelenie_sucas_lexikografie_vyskumna_cinnost.html Slovak-Czech Dictionary http://gacr311.ujc.cas.cz/web/ • Multiword expressions • Dictionary of Slovak Collocations http://vronk.net/wicol • Translation equivalents • phrases from parallel corpora (en-sk,cs-sk,bg-sk) http://slovniky.korpus.sk/ • Grammatical patterns • Slovak Valency Dictionary (internal database, no URL yet)
  • 54. Slovenia University of Ljubljana, Faculty of Arts; Trojina, Institute for Applied Slovene Studies; Jožef Stefan Institute • Lemma List • Frequency Information • Multiword expressions • Grammatical patterns • Linguistic labels • Communication in Slovene: http://eng.slovenscina.eu Slovene Lexical Database: http://eng.slovenscina.eu/spletni-slovar/leksikalna-baza Sloleks - morphological lexicon: http://eng.slovenscina.eu/sloleks/opis Termis: http://www.termis.fdv.uni-lj.si/index- en.html • Form Variation • Communication in Slovene: http://eng.slovenscina.eu Ortography Guide: http://eng.slovenscina.eu/portali/slogovni-prirocnik
  • 55. Spain Universidade da Coruña and Real Academia Galega • Example sentences • Neologisms • Definitions • Lexical-semantic relations • Word senses • Linguistic labels Spanish-Galician Dictionary of the Royal Galician Academy No publications on the automatic acquisition of knowledge
  • 56. Spain University Institute for Applied Linguistics (Pompeu Fabra University) • Lemma list • Frequency information • Example sentences • Grammatical patterns Terminus 2.0, a web application for corpus and terminology managment http://terminus.iula.upf.edu • Neologisms Buscaneo http://obneo.iula.upf.edu/buscaneo/
  • 57. Sweden University of Gothenburg, Dpt. of Swedish, Språkbanken • Lemma lists 1. Kelly (http://spraakbanken.gu.se/eng/kelly) 2. Academic Wordlist (AO, http://spraakbanken.gu.se/eng/forskning/akademiska-ordlistor) 3. SVALex (ongoing, target Swedish as a second language lexicon); • Form variation 1) Diabase (http://spraakbanken.gu.se/eng/forskning/diabase); 2) Mathir (http://spraakbanken.gu.se/eng/mathir); • MWEs Constructicon (http://spraakbanken.gu.se/eng/sweccn); • Example sentences (HitEx (http://spraakbanken.gu.se/larka/larka_hitex_index.html) • Definitions (Semantic Interoperability and Data Mining in Biomedicine (http://cordis.europa.eu/project/rcn/71155_en.html) • Lexical-semantic relations (SweFN++ (http://spraakbanken.gu.se/eng/swefn) • Word senses (Distributional Methods to Represent the Meaning of Frames and Constructions (http://spraakbanken.gu.se/eng/corpsem) • Grammatical patterns (Culturomics (http://spraakbanken.gu.se/eng/culturomics) • Linguistic labels (1. A Swedish vocation list 2. ongoing PhD thesis on automatic readability classification of texts and sentences 3. Semantics in Storytelling in Swedish Fiction (list of relations, named entity recognition, aliases)
  • 58. Switzerland École Polytechnique Fédérale de Lausanne • Example sentences: Kamusi Global Online Living Dictionary http://kamusi.org
  • 59. Denmark / France Aarhus University, Business and Social Sciences, Department of Business Communication; Université de Bourgogne, Maison des Sciences de l'Homme • Lemma list • Form variation • Example sentences • Multiword expressions • Neologisms • Knowledge rich contexts • Oenolex, wine dictionary • Other • Discourse markers acquisition and discourse interaction markers

Editor's Notes

  1. Positions of those completing the survey