Electronic dictionaries in writing tools:
user needs and models for user interaction
Ulrich Heid
Universit¨at Hildesheim,
Institut f¨ur Informationswissenschaft und Sprachtechnologie,
Universit¨atsplatz,1 — D 31141 Hildesheim, Germany
Santiago de Compostela: Multilex-2015,
October 2015
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 1 / 32
Overview
• Framework: Lexicographic Function Theory
and its implications for e-dictionary making
• User needs:
• General aspects
• Needs in text production –
and proposals from the literature to satisfy them:
• Needs resulting from linguistic complexity
• Needs resulting from different levels of knowledge of users
• Models of interaction:
• Information on demand
• (New) Ways of presenting lexicographic data
• Conclusion: lessons learnt
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 2 / 32
Context — and Warning
Projects – cooperation
• This presentation does not contain anything new:
it just re-arranges and re-interprets recent work:
rather practical state of the art than abstract visions
• Based on cooperation in
SeLA – Scientific e-Lexicography for Africa:
Project funded by BMBF (05-2012 – 12-2015) and organized by DAAD
• University of Pretoria Theo Bothma – Daan Prinsloo – Elsab´e Taljard
• University of Stellenbosch Rufus H. Gouws
• UNISA, University of South Africa Sonja E. Bosch
• University of Namibia Herman Beyer
• University of Hildesheim Gertrud Faaß
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 3 / 32
Framework and reminder: Lexicographic Function Theory
Dictionaries as information tools Tarp 2008 etc.
• The dictionary provides data from which users can derive information
to satify a given need
• An “ideal” dictionary
provides the user with
exactly that
{ types | amount of... } data
which he/she needs
• Assumption in FT:
Lexicographers (should) know
what is best for a given user (type)
→ different types of (e-)dictionaries
→ different data offers
potential user
user situation
need for information lexicographical data
extraction of inform.satisfaction of needs
dictionary
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 4 / 32
Framework and reminder: Lexicographic Function Theory
Parameters influencing the process of information derivation Tarp 2008 etc.
• Needs of users arising in different situations:
• Cognitive needs: learn about “things” or words
• Communicative needs:
• Text production vs. text reception
• Monolingual vs. bilingual
• etc.
• Users’ pre-existing knowledge
• Knowledge of the targeted language
• Knowledge of the targeted domain (e.g. in specialized dictionaries)
• Knowledge about using the (e-) dictionary,
or, more generally,
about using electronic information tools
• Awareness of the use situation and needs
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 5 / 32
Implications of user needs and pre-existing knowledge
A view on the scenario of lexicography
• To satisfy different user needs,
lexicographers will collect large amounts of lexicographic data
• For each type of need and/or for each type of user,
a specific subset of the data will be needed
• Thus a filtering approach is necessary,
where the filter is defined
according to
user types and needs
user−1
user−2
user−n
dict−1
dict−2
dict−3
filterslexgr.
data
specifications
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 6 / 32
Implications of user needs and pre-existing knowledge
Lexicographic scenario: need for well-defined dictionary specifications
Dictionary plan Gouws 2013
• Lexicographic data categories:
• Must be clearly distingushed, categorized and marked up
• Must be presentable in different forms, Spohr2012
e.g. with different degrees of specialization, different metalanguage, etc.
• Filtering:
• By lexicographic function
• According to
pre-existing knowledge
→ Selection
of data categories
→ Selection
of presentation modes
user−1
user−2
user−n
dict−1
dict−2
dict−3
filterslexgr.
data
specifications
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 7 / 32
User needs: general aspects
Parameters relevant for data selection
• Lexicographic functions
• Text production ←→ text reception
• Elements of cognitive needs involved in a communicative situation:
learning while producing text – training for text production
• Properties of the targeted linguistic phenomena
• Lexicographic data categories needed for a given function:
words — word combinations — linguistic properties — ...
• Interaction of lexical objects with “grammar”
• Pre-existing knowledge in users
• Lexical items of the targeted language
• Linguistic properties of the targeted lexical items
• Grammatical knowledge of the targeted language
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 8 / 32
User needs in text production
Linguistic aspects
• Need to know a lexical object
• Access:
• From a “concept”
• Form a source language item
• Choice among alternatives, based on properties of each
• Need to insert le lexical object into an upcoming context:
construction — sentence — discourse — text (type) ...
• Access to linguistic properties of lexical objects,
on different levels of linguistic description
• Some properties may act as constraints and rule out certain options
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 9 / 32
User needs in text production
Levels of interactvity – interaction models Prinsloo, Bothma and Heid 2015
• Mainly interactive tools:
with different amounts of user interaction required
• Step-wise build-up of a construction or a sentence
• Guidance through options of lexical or grammatical choice
• Guidance with cognitively oriented elements:
lexical or grammatical explanations
• Mainly automatic tools:
User input triggers automatic processing
• Checking tools: Verlinde 2014 and ILT online
grammar checkers — style checkers — collocation checkers ...
• (Autoomatic) translation functions
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 10 / 32
Phenomenon-related needs: collocations as a case in point
An example of criteria for the selection of lexicographic data categories
• Notion of collocation underlying:
In the tradition of pedagogical lexicography Hausmann 2006, Mel’ˇcuk
• Lexically and/or pragmatically constrained,
language-specific: Bartsch 2004
FR prendre une douche ←→ IT fare la doccia
• Base plus collocate: {douche | doccia} ⊕ verb
• Syntactic relationship between base and collocate
• Lexicographic data needed: Gouws 2015
• Knowledge of the collocation:
preferred lexical combination
• Knowledge about the collocation:
properties relevant for its insetion into context
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 11 / 32
Phenomenon-related needs: collocations as a case in point
Types of knowledge about collocations relevant for text production – Examples
• Morphosyntax: e.g.
• Number preferences:
DE den Rechtswegsing. einschlagen ([to] take legal action)
←→ IT adire le vieplural legali
• Determination: IT fare la doccia, ([to] take a shower)
DE sein Veto einlegen ([to] veto)
• Syntactic valency: e.g.
[to] be in a position (+ to +INF)
DE in der Lage sein (+ zu + INF)
• Collocational preferences: e.g.
DE {scharfe|heftige|massiv(e)...} Kritik ¨uben ([to ]criticize severely)
• Pragmatic preferences: e.g. by text type:
FR medical experts: X accroˆıt le risque de X (X increases the risk of Y)
FR medical lay persons: X augmente le risque de X Wandji Tchami et al. 2015
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 12 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
Different scenarios
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
• Text production: onomasiological access cf. Giacomini 2013
known searched for
base lemma + reading
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
• Text production: onomasiological access
known searched for
base lemma + reading
meaning of word combination typical collocation (lexical rendition)
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
• Text production: onomasiological access
known searched for
base lemma + reading
meaning of word combination typical collocation (lexical rendition)
maybe: syntactic environment fit into text/sentence to be built
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
• Text production: onomasiological access
known searched for
base lemma + reading
meaning of word combination typical collocation (lexical rendition)
maybe: syntactic environment fit into text/sentence to be built
• Text reception: semasiological, form-based access
known searched for
(element of) word (combination) meaning in context
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
Access to data on collocations
• Text production: onomasiological access
known searched for
base lemma + reading
meaning of word combination typical collocation (lexical rendition)
maybe: syntactic environment fit into text/sentence to be built
• Text reception: semasiological, form-based access
known searched for
(element of) word (combination) meaning in context
plus pragmatic properties
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
Phenomenon-related needs: collocations as a case in point
An example: different kinds of access Data from OCDSE
Production
Reading 1: forward movement [military]
• ADJ + advance
- [speed] rapid ∼
- [agent] German ∼, Allied ∼, etc.
• V + advance
- [make] make an ∼on X
The regiment made an advance on the
enemy lines.
Reading 2: development (often in the plural)
• ADJ + advance
- [amount] considerable ∼; big ∼,
substantial ∼;
dramatic ∼, enormous ∼, great ∼,
spectacular ∼, tremendous ∼.
• V + advance
- [make] make ∼es (in/on) [plural!]
Reading 3: amount of money
• ADJ + advance
- [quantity] small ∼, large ∼ - [type] cash ∼
• V + advance
- [provide] give so. an ∼, pay so. an ∼
The university pays me an advance for this
business trip.
Reception
• Readings
(1) [military] forward movement
(2) development
(3) amount of money
• Typical adjectives
- Allied etc. (cf. German etc.) (1)
- big (=considerable) (2)
- cash (3)
- considerable (=big) (2)
- dramatic (2)
- German (cf. Allied, etc.) (1)
- great (2)
- important (1)
- large (3)
- notable (2)
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 14 / 32
Phenomenon-related needs: collocations as a case in point
Access to collocational data for text production
Proposal for onomasiological access — example Giacomini 2011: 263
• Search:
Base syntactic filter semantic filters
paura fear ⊕ PP (di) ⊕ cause
(= natural phenomenon)
• Result:
paura [...]
colloc:
paura ⊕ PP (di)
– causa:
elementi e fenomeni naturali:
paura del terremoto; paura del fuoco; ...
• Option for a comparison with collocations of quasi-synonyms:
paura del fuoco ↔ panico per il fuoco; *spavento, *ansia
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 15 / 32
Phenomenon-related needs: collocations as a case in point
A wireframe prototype for a collocation dictionary (1/3)
Step 1: Enter base lemma possibly with reading, if it is polysemous
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 16 / 32
Phenomenon-related needs: collocations as a case in point
A wireframe prototype for a collocation dictionary (2/3)
Step 2: Semantic selection
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 17 / 32
Phenomenon-related needs: collocations as a case in point
A wireframe prototype for a collocation dictionary (3/3)
Step 3: Syntactic selection
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 18 / 32
Needs due to different levels of pre-existing knowledge
Lexical selection as a complex decision task — Bantu languages
Copulatives in Northern Sotho:
how to translate [to] be (1/3) Bothma et al. 2013
• Linguistic parameters of the lexico-grammatical selection task:
• Lexical semantics: *3
Identifying Descriptive Associative
this is a letter this woman is clever he is (together) with Sara
ke lengwalo mosadi yo o bohlale o na le Sara
• Aktionsart-like: stative ←→ incohative *2
• Mood: indicative ←→ situative ←→ relative *3
• Person or noun class *(14+4)
• Positive ←→ negative *2
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 19 / 32
Needs due to different levels of pre-existing knowledge
Copulatives in Northern Sotho (2/3)
Model for stepwise guidance:
Lexical selection as a decision tree
A
B
C
D
E
F
G
? B or C
?
? F or G
D or E
• Choice points: A, B. C...
• Provides only relevant choices,
depending on prior selection(s)
• Presence of cognitively relevant data at each choice point:
Grammatical hints about the choice at hand — examples
→ A combination of dictionary and grammar,
with on-demand support for text production
• Systematic path to the solution
• Decision-relevant information provided:
• Options at each choice point (minimal amount of data)
• Grammatical hints and examples only if needed by the user
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 20 / 32
Needs due to different levels of pre-existing knowledge
Copulatives in Northern Sotho – sample steps (3a/3)
• Selecting stative vs. incohative copulative
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 21 / 32
Needs due to different levels of pre-existing knowledge
Copulatives in Northern Sotho – sample steps (3b/3)
• Selecting one of the readings of the copulative:
identifying – descriptive – associative
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 22 / 32
Needs due to different levels of pre-existing knowledge
Copulatives in Northern Sotho – sample steps (3c/3)
• Stative descriptive copulative selected,
selection among moods: indicative – situative – relative
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 23 / 32
Needs due to different levels of pre-existing knowledge
Copulatives in Northern Sotho – sample steps (3d/3)
• Almost all features selected –
remains noun class
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 24 / 32
Needs due to different levels of pre-existing knowledge
Copulatives in Northern Sotho – sample steps (3e/3)
• For noun class:
select positive vs. negated
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 25 / 32
Needs due to different levels of pre-existing knowledge
Combining data for communicative and cognitive needs
Learner-oriented tools for text production: Bosch/Faaß 2014
e-Zulu (and e-Xhosa) dictionary and grammar trainer Sanasi 2015
• Focus on the Zulu possessive construction:
• Lexical choice of nominals for possessor and possession
• Noun classes of possessor and possession
• Noun-class-dependent connector (expressing the possessive relation)
• Morphophonological adaptation rules
• Stepwise guidance on demand:
• Nominal lexemes can be input in Zulu or English
• Data about input by user or provided by system
the noun class and the connector
• etc.
• Reminder of rules on demand
→ From stepwise guidance to full translation
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 26 / 32
Needs due to different levels of pre-existing knowledge
e-Zulu dictionary (1/2) Bosch/Faaß 2014
• Input in English: rooms of hotel
• Choice options:
• Translation only
• Stepwise explanation of Zulu rules applied
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 27 / 32
Needs due to different levels of pre-existing knowledge
e-Zulu dictionary (2/2) Bosch/Faaß 2014
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 28 / 32
Needs due to different levels of pre-existing knowledge
Combining data for communicative and cognitive needs
Learner-oriented tools for text production: Prinsloo et al. 2014, 2015
Sepedi (= Northern Sotho) sentence builder for speakers of English
• Phenomena:
• Lexical selection: nominals, verbs
• Noun class system of Sepedi — concords and pronouns
• Grammatical rules for valency constructions, relative clauses, etc.
• Same principles as with Zulu possessives:
• On each step in text production, Individualization: Tarp 2011
user may decide whether and how much help to get from the tool
• User input may be in either English or Sepedi,
with option open at each step of the sentence construction
• Integratable with a large English → Sepedi dictionary
• Grammatical information on demand
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 29 / 32
Models for user interaction
Information on demand Bothma 2011
• Basic amount of data is available by default
• Additional data may be accessed via unfoldable items:
• Grammatical explanations in decision trees Bothma et al. 2013
• “Info” button in Sepedi sentence builder Prinsloo et al. 2015
• Option to see explanations inlearning tools Sanasi 2015
⇒ Open questions:
• Deciding beforehand profile-based dictionaries
about amount of data required
or deciding at each step in the text production process ?
• How much use is made by users of extra data offer? Trap-Jensen 2010
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 30 / 32
Models for user interaction
Linguistic complexity ←→ interactional simplicity
• Dilemma:
• Complex linguistic decision processes
may require complex descriptions Bantu languages – collocation selection
• But:
Many users want simple tools, easy to use:
• Few clicks
• Short explanations
• Little effort before getting to the result Heid/Zimmermann 2012
• Proposal:
• Providing guidance tools only on demand,
in addition to “standard” dictionary entries
• Maybe adding non-linear guidance devices, especially for learners:
• Graphical elements Runte 2015
• Interactive elements, for learners to explore linguistic phenomena
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 31 / 32
Models for user interaction
Graphical display of lexical relations Runte 2015
• Display of relationships between lexemes:
• Paradigmatic:
• Synonyms, Antonyms
• Hyp(er)onyms
• Syntagmatic:
• Typical adjectives
• Typical verbs, ...
<Qualifikation>
qualifiziert hochqualifiziert
Angestellter
Arbeiter
Erwerbstaetiger
Arbeitskraft
einstellen
beschaeftigen
kuendigen
arbeiten
Arbeit−
nehmer
• Analyzed
in eye-tracking studies:
presentation
works well for learners
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 32 / 32
Conclusion
Lessons learnt from overview of recent work
• Parameters relevant for
the design of dictionaries in writing support tools:
• Properties of targeted lexical objects:
Addressing linguistic complexity
• Pre-existing knowledge of users:
On lexical objects and their insertion inzo zext
• Flexibility wrt interaction models:
Combining automatic and interactive use
• Current approaches
• Constrained-based selection in collocations dictionary mainly from SeLA
• Stepwise guidance in decision trees
• Learners’ bilingual dictionaries with explanations
• Stepwise sentence builder:
Flexible amounts of support
• Graphical presentation
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 33 / 32
Future work
• User testing of prototypes,
to understand which approaches work best
• From mock-ups and prototypes
to tools with sizeable lexical resources:
• e-Zulu: several hundreds of items
• Spedi sentence builder: work towards large grammatical cov
Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 34 / 32

Electronic dictionaries in writing tools: user needs and models for user interaction

  • 1.
    Electronic dictionaries inwriting tools: user needs and models for user interaction Ulrich Heid Universit¨at Hildesheim, Institut f¨ur Informationswissenschaft und Sprachtechnologie, Universit¨atsplatz,1 — D 31141 Hildesheim, Germany Santiago de Compostela: Multilex-2015, October 2015 Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 1 / 32
  • 2.
    Overview • Framework: LexicographicFunction Theory and its implications for e-dictionary making • User needs: • General aspects • Needs in text production – and proposals from the literature to satisfy them: • Needs resulting from linguistic complexity • Needs resulting from different levels of knowledge of users • Models of interaction: • Information on demand • (New) Ways of presenting lexicographic data • Conclusion: lessons learnt Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 2 / 32
  • 3.
    Context — andWarning Projects – cooperation • This presentation does not contain anything new: it just re-arranges and re-interprets recent work: rather practical state of the art than abstract visions • Based on cooperation in SeLA – Scientific e-Lexicography for Africa: Project funded by BMBF (05-2012 – 12-2015) and organized by DAAD • University of Pretoria Theo Bothma – Daan Prinsloo – Elsab´e Taljard • University of Stellenbosch Rufus H. Gouws • UNISA, University of South Africa Sonja E. Bosch • University of Namibia Herman Beyer • University of Hildesheim Gertrud Faaß Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 3 / 32
  • 4.
    Framework and reminder:Lexicographic Function Theory Dictionaries as information tools Tarp 2008 etc. • The dictionary provides data from which users can derive information to satify a given need • An “ideal” dictionary provides the user with exactly that { types | amount of... } data which he/she needs • Assumption in FT: Lexicographers (should) know what is best for a given user (type) → different types of (e-)dictionaries → different data offers potential user user situation need for information lexicographical data extraction of inform.satisfaction of needs dictionary Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 4 / 32
  • 5.
    Framework and reminder:Lexicographic Function Theory Parameters influencing the process of information derivation Tarp 2008 etc. • Needs of users arising in different situations: • Cognitive needs: learn about “things” or words • Communicative needs: • Text production vs. text reception • Monolingual vs. bilingual • etc. • Users’ pre-existing knowledge • Knowledge of the targeted language • Knowledge of the targeted domain (e.g. in specialized dictionaries) • Knowledge about using the (e-) dictionary, or, more generally, about using electronic information tools • Awareness of the use situation and needs Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 5 / 32
  • 6.
    Implications of userneeds and pre-existing knowledge A view on the scenario of lexicography • To satisfy different user needs, lexicographers will collect large amounts of lexicographic data • For each type of need and/or for each type of user, a specific subset of the data will be needed • Thus a filtering approach is necessary, where the filter is defined according to user types and needs user−1 user−2 user−n dict−1 dict−2 dict−3 filterslexgr. data specifications Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 6 / 32
  • 7.
    Implications of userneeds and pre-existing knowledge Lexicographic scenario: need for well-defined dictionary specifications Dictionary plan Gouws 2013 • Lexicographic data categories: • Must be clearly distingushed, categorized and marked up • Must be presentable in different forms, Spohr2012 e.g. with different degrees of specialization, different metalanguage, etc. • Filtering: • By lexicographic function • According to pre-existing knowledge → Selection of data categories → Selection of presentation modes user−1 user−2 user−n dict−1 dict−2 dict−3 filterslexgr. data specifications Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 7 / 32
  • 8.
    User needs: generalaspects Parameters relevant for data selection • Lexicographic functions • Text production ←→ text reception • Elements of cognitive needs involved in a communicative situation: learning while producing text – training for text production • Properties of the targeted linguistic phenomena • Lexicographic data categories needed for a given function: words — word combinations — linguistic properties — ... • Interaction of lexical objects with “grammar” • Pre-existing knowledge in users • Lexical items of the targeted language • Linguistic properties of the targeted lexical items • Grammatical knowledge of the targeted language Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 8 / 32
  • 9.
    User needs intext production Linguistic aspects • Need to know a lexical object • Access: • From a “concept” • Form a source language item • Choice among alternatives, based on properties of each • Need to insert le lexical object into an upcoming context: construction — sentence — discourse — text (type) ... • Access to linguistic properties of lexical objects, on different levels of linguistic description • Some properties may act as constraints and rule out certain options Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 9 / 32
  • 10.
    User needs intext production Levels of interactvity – interaction models Prinsloo, Bothma and Heid 2015 • Mainly interactive tools: with different amounts of user interaction required • Step-wise build-up of a construction or a sentence • Guidance through options of lexical or grammatical choice • Guidance with cognitively oriented elements: lexical or grammatical explanations • Mainly automatic tools: User input triggers automatic processing • Checking tools: Verlinde 2014 and ILT online grammar checkers — style checkers — collocation checkers ... • (Autoomatic) translation functions Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 10 / 32
  • 11.
    Phenomenon-related needs: collocationsas a case in point An example of criteria for the selection of lexicographic data categories • Notion of collocation underlying: In the tradition of pedagogical lexicography Hausmann 2006, Mel’ˇcuk • Lexically and/or pragmatically constrained, language-specific: Bartsch 2004 FR prendre une douche ←→ IT fare la doccia • Base plus collocate: {douche | doccia} ⊕ verb • Syntactic relationship between base and collocate • Lexicographic data needed: Gouws 2015 • Knowledge of the collocation: preferred lexical combination • Knowledge about the collocation: properties relevant for its insetion into context Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 11 / 32
  • 12.
    Phenomenon-related needs: collocationsas a case in point Types of knowledge about collocations relevant for text production – Examples • Morphosyntax: e.g. • Number preferences: DE den Rechtswegsing. einschlagen ([to] take legal action) ←→ IT adire le vieplural legali • Determination: IT fare la doccia, ([to] take a shower) DE sein Veto einlegen ([to] veto) • Syntactic valency: e.g. [to] be in a position (+ to +INF) DE in der Lage sein (+ zu + INF) • Collocational preferences: e.g. DE {scharfe|heftige|massiv(e)...} Kritik ¨uben ([to ]criticize severely) • Pragmatic preferences: e.g. by text type: FR medical experts: X accroˆıt le risque de X (X increases the risk of Y) FR medical lay persons: X augmente le risque de X Wandji Tchami et al. 2015 Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 12 / 32
  • 13.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 14.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations Different scenarios Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 15.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations • Text production: onomasiological access cf. Giacomini 2013 known searched for base lemma + reading Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 16.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations • Text production: onomasiological access known searched for base lemma + reading meaning of word combination typical collocation (lexical rendition) Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 17.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations • Text production: onomasiological access known searched for base lemma + reading meaning of word combination typical collocation (lexical rendition) maybe: syntactic environment fit into text/sentence to be built Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 18.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations • Text production: onomasiological access known searched for base lemma + reading meaning of word combination typical collocation (lexical rendition) maybe: syntactic environment fit into text/sentence to be built • Text reception: semasiological, form-based access known searched for (element of) word (combination) meaning in context Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 19.
    Phenomenon-related needs: collocationsas a case in point Access to data on collocations • Text production: onomasiological access known searched for base lemma + reading meaning of word combination typical collocation (lexical rendition) maybe: syntactic environment fit into text/sentence to be built • Text reception: semasiological, form-based access known searched for (element of) word (combination) meaning in context plus pragmatic properties Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 13 / 32
  • 20.
    Phenomenon-related needs: collocationsas a case in point An example: different kinds of access Data from OCDSE Production Reading 1: forward movement [military] • ADJ + advance - [speed] rapid ∼ - [agent] German ∼, Allied ∼, etc. • V + advance - [make] make an ∼on X The regiment made an advance on the enemy lines. Reading 2: development (often in the plural) • ADJ + advance - [amount] considerable ∼; big ∼, substantial ∼; dramatic ∼, enormous ∼, great ∼, spectacular ∼, tremendous ∼. • V + advance - [make] make ∼es (in/on) [plural!] Reading 3: amount of money • ADJ + advance - [quantity] small ∼, large ∼ - [type] cash ∼ • V + advance - [provide] give so. an ∼, pay so. an ∼ The university pays me an advance for this business trip. Reception • Readings (1) [military] forward movement (2) development (3) amount of money • Typical adjectives - Allied etc. (cf. German etc.) (1) - big (=considerable) (2) - cash (3) - considerable (=big) (2) - dramatic (2) - German (cf. Allied, etc.) (1) - great (2) - important (1) - large (3) - notable (2) Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 14 / 32
  • 21.
    Phenomenon-related needs: collocationsas a case in point Access to collocational data for text production Proposal for onomasiological access — example Giacomini 2011: 263 • Search: Base syntactic filter semantic filters paura fear ⊕ PP (di) ⊕ cause (= natural phenomenon) • Result: paura [...] colloc: paura ⊕ PP (di) – causa: elementi e fenomeni naturali: paura del terremoto; paura del fuoco; ... • Option for a comparison with collocations of quasi-synonyms: paura del fuoco ↔ panico per il fuoco; *spavento, *ansia Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 15 / 32
  • 22.
    Phenomenon-related needs: collocationsas a case in point A wireframe prototype for a collocation dictionary (1/3) Step 1: Enter base lemma possibly with reading, if it is polysemous Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 16 / 32
  • 23.
    Phenomenon-related needs: collocationsas a case in point A wireframe prototype for a collocation dictionary (2/3) Step 2: Semantic selection Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 17 / 32
  • 24.
    Phenomenon-related needs: collocationsas a case in point A wireframe prototype for a collocation dictionary (3/3) Step 3: Syntactic selection Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 18 / 32
  • 25.
    Needs due todifferent levels of pre-existing knowledge Lexical selection as a complex decision task — Bantu languages Copulatives in Northern Sotho: how to translate [to] be (1/3) Bothma et al. 2013 • Linguistic parameters of the lexico-grammatical selection task: • Lexical semantics: *3 Identifying Descriptive Associative this is a letter this woman is clever he is (together) with Sara ke lengwalo mosadi yo o bohlale o na le Sara • Aktionsart-like: stative ←→ incohative *2 • Mood: indicative ←→ situative ←→ relative *3 • Person or noun class *(14+4) • Positive ←→ negative *2 Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 19 / 32
  • 26.
    Needs due todifferent levels of pre-existing knowledge Copulatives in Northern Sotho (2/3) Model for stepwise guidance: Lexical selection as a decision tree A B C D E F G ? B or C ? ? F or G D or E • Choice points: A, B. C... • Provides only relevant choices, depending on prior selection(s) • Presence of cognitively relevant data at each choice point: Grammatical hints about the choice at hand — examples → A combination of dictionary and grammar, with on-demand support for text production • Systematic path to the solution • Decision-relevant information provided: • Options at each choice point (minimal amount of data) • Grammatical hints and examples only if needed by the user Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 20 / 32
  • 27.
    Needs due todifferent levels of pre-existing knowledge Copulatives in Northern Sotho – sample steps (3a/3) • Selecting stative vs. incohative copulative Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 21 / 32
  • 28.
    Needs due todifferent levels of pre-existing knowledge Copulatives in Northern Sotho – sample steps (3b/3) • Selecting one of the readings of the copulative: identifying – descriptive – associative Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 22 / 32
  • 29.
    Needs due todifferent levels of pre-existing knowledge Copulatives in Northern Sotho – sample steps (3c/3) • Stative descriptive copulative selected, selection among moods: indicative – situative – relative Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 23 / 32
  • 30.
    Needs due todifferent levels of pre-existing knowledge Copulatives in Northern Sotho – sample steps (3d/3) • Almost all features selected – remains noun class Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 24 / 32
  • 31.
    Needs due todifferent levels of pre-existing knowledge Copulatives in Northern Sotho – sample steps (3e/3) • For noun class: select positive vs. negated Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 25 / 32
  • 32.
    Needs due todifferent levels of pre-existing knowledge Combining data for communicative and cognitive needs Learner-oriented tools for text production: Bosch/Faaß 2014 e-Zulu (and e-Xhosa) dictionary and grammar trainer Sanasi 2015 • Focus on the Zulu possessive construction: • Lexical choice of nominals for possessor and possession • Noun classes of possessor and possession • Noun-class-dependent connector (expressing the possessive relation) • Morphophonological adaptation rules • Stepwise guidance on demand: • Nominal lexemes can be input in Zulu or English • Data about input by user or provided by system the noun class and the connector • etc. • Reminder of rules on demand → From stepwise guidance to full translation Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 26 / 32
  • 33.
    Needs due todifferent levels of pre-existing knowledge e-Zulu dictionary (1/2) Bosch/Faaß 2014 • Input in English: rooms of hotel • Choice options: • Translation only • Stepwise explanation of Zulu rules applied Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 27 / 32
  • 34.
    Needs due todifferent levels of pre-existing knowledge e-Zulu dictionary (2/2) Bosch/Faaß 2014 Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 28 / 32
  • 35.
    Needs due todifferent levels of pre-existing knowledge Combining data for communicative and cognitive needs Learner-oriented tools for text production: Prinsloo et al. 2014, 2015 Sepedi (= Northern Sotho) sentence builder for speakers of English • Phenomena: • Lexical selection: nominals, verbs • Noun class system of Sepedi — concords and pronouns • Grammatical rules for valency constructions, relative clauses, etc. • Same principles as with Zulu possessives: • On each step in text production, Individualization: Tarp 2011 user may decide whether and how much help to get from the tool • User input may be in either English or Sepedi, with option open at each step of the sentence construction • Integratable with a large English → Sepedi dictionary • Grammatical information on demand Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 29 / 32
  • 36.
    Models for userinteraction Information on demand Bothma 2011 • Basic amount of data is available by default • Additional data may be accessed via unfoldable items: • Grammatical explanations in decision trees Bothma et al. 2013 • “Info” button in Sepedi sentence builder Prinsloo et al. 2015 • Option to see explanations inlearning tools Sanasi 2015 ⇒ Open questions: • Deciding beforehand profile-based dictionaries about amount of data required or deciding at each step in the text production process ? • How much use is made by users of extra data offer? Trap-Jensen 2010 Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 30 / 32
  • 37.
    Models for userinteraction Linguistic complexity ←→ interactional simplicity • Dilemma: • Complex linguistic decision processes may require complex descriptions Bantu languages – collocation selection • But: Many users want simple tools, easy to use: • Few clicks • Short explanations • Little effort before getting to the result Heid/Zimmermann 2012 • Proposal: • Providing guidance tools only on demand, in addition to “standard” dictionary entries • Maybe adding non-linear guidance devices, especially for learners: • Graphical elements Runte 2015 • Interactive elements, for learners to explore linguistic phenomena Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 31 / 32
  • 38.
    Models for userinteraction Graphical display of lexical relations Runte 2015 • Display of relationships between lexemes: • Paradigmatic: • Synonyms, Antonyms • Hyp(er)onyms • Syntagmatic: • Typical adjectives • Typical verbs, ... <Qualifikation> qualifiziert hochqualifiziert Angestellter Arbeiter Erwerbstaetiger Arbeitskraft einstellen beschaeftigen kuendigen arbeiten Arbeit− nehmer • Analyzed in eye-tracking studies: presentation works well for learners Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 32 / 32
  • 39.
    Conclusion Lessons learnt fromoverview of recent work • Parameters relevant for the design of dictionaries in writing support tools: • Properties of targeted lexical objects: Addressing linguistic complexity • Pre-existing knowledge of users: On lexical objects and their insertion inzo zext • Flexibility wrt interaction models: Combining automatic and interactive use • Current approaches • Constrained-based selection in collocations dictionary mainly from SeLA • Stepwise guidance in decision trees • Learners’ bilingual dictionaries with explanations • Stepwise sentence builder: Flexible amounts of support • Graphical presentation Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 33 / 32
  • 40.
    Future work • Usertesting of prototypes, to understand which approaches work best • From mock-ups and prototypes to tools with sizeable lexical resources: • e-Zulu: several hundreds of items • Spedi sentence builder: work towards large grammatical cov Heid (IwiSt/IMS) Text production dictionaries santiago15-fol 34 / 32