Compiling a Monolingual Dictionary for Native Speakers <ul><li>Patrick Hanks </li></ul><ul><li>Formerly chief editor, Curr...
Talk Outline <ul><li>What is a lexical database?  What is a dictionary? </li></ul><ul><li>L1 dictionaries and their users ...
A Lexical Database <ul><li>A lexical database is a summary of the evidence – corpus evidence – for each word in the langua...
The social function of L1 dictionaries <ul><li>A great monolingual dictionary has a socially unifying function </li></ul><...
Typology of L1 English dictionaries <ul><li>British: </li></ul><ul><ul><li>Historical principles:  Oxford English Dictiona...
What’s the difference between ‘historical principles’ and ‘synchronic principles’? <ul><li>Historical principles place the...
The instability of word meaning <ul><ul><li>The synchronic/historical distinction affects many words.  </li></ul></ul><ul>...
Word histories <ul><li>Modern British and American dictionaries – even dictionaries on synchronic principles – have a C19 ...
Getting the words in (1) <ul><li>Building on existing dictionaries </li></ul><ul><ul><li>Lexicography is accretive </li></...
Getting the words in (2) <ul><li>Building on existing dictionaries </li></ul><ul><ul><li>Lexicography is accretive </li></...
Why do people want a dictionary of their native language? <ul><li>There are no good recent studies of L1 dictionary use in...
Informal feedback from marketing departments (1) <ul><li>People use an L1 dictionary mainly: </li></ul><ul><ul><li>for cor...
Informal feedback from marketing departments (2) <ul><li>An L1 dictionary is also used: </li></ul><ul><ul><li>a s a source...
The role of corpus data <ul><li>Corpora show how each word is used </li></ul><ul><ul><li>providing an essential source of ...
The dictionary as inventory <ul><li>An L1 dictionary should contain “all the words in the language” </li></ul><ul><ul><li>...
Researching lexical items: collecting evidence <ul><li>Building on existing dictionaries </li></ul><ul><ul><li>Lexicograph...
Some 2006 new words: from  Macmillan English Dictionary <ul><li>blogosphere ,  noun.  the imaginary place on the Internet ...
Even Homer nods  (especially when copying) <ul><li>dord,  n.  density .  </li></ul><ul><ul><li>actually copied from anothe...
Terminology of special fields <ul><li>Science, technology, sports, pastimes, slang: </li></ul><ul><ul><li>How far should a...
L1 dictionary macrostructure <ul><li>The lexical item </li></ul><ul><ul><li>words </li></ul></ul><ul><ul><li>multiword exp...
Microstructure  <ul><ul><li>Lemma (inflected forms)  </li></ul></ul><ul><ul><li>Pronunciation </li></ul></ul><ul><ul><li>W...
The lemma <ul><li>strong, stronger, strongest </li></ul><ul><ul><li>strongly </li></ul></ul><ul><li>strength </li></ul><ul...
Pronunciation <ul><li>Should a printed LI dictionary text give guidance on pronunciation at all? </li></ul><ul><ul><li>Mor...
Dictionary definitions <ul><li>What is a word meaning? Does it exist? </li></ul><ul><ul><li>“ A text is a unique deploymen...
Writing definitions of technical terms <ul><li>Stipulations by scientific committees and other classifying systems </li></...
technical definitions (1) <ul><li>second,  noun.  a sixtieth of a minute of time, which as the SI unit of time is defined ...
technical definitions (2) <ul><li>spider : an eight-legged predatory arachnid with an unsegmented body consisting of a fus...
Word sketch for ‘spider’ <ul><li>object_of   : 134 1.5 catch  9 15 3.93 watch  6  eat  4 3.43 find  8 29 0.89 put  4  see ...
Corpus-based profile for ‘spider’ <ul><li>Many thousands of species of spiders are known ( funnel-web, web-building, orb-w...
The virtues of brevity <ul><li>Avoid verbosity! </li></ul><ul><li>Even if in the dictionary of the future space is unlimit...
Lexical  syntagmatics <ul><li>Convention: </li></ul><ul><li>A dictionary can show the relations between typical, normal ph...
Selecting examples of usage <ul><li>No invented examples!  </li></ul><ul><ul><li>Intuitions and usage are inverse variable...
The need to get on with it! <ul><li>The lexicon of a language is large. Dictionary compilation is a huge task.  </li></ul>...
The future of L1 dictionaries <ul><li>The medium: </li></ul><ul><ul><li>Print? CD-Rom? On-line? </li></ul></ul><ul><ul><li...
Conclusions (1) <ul><li>L1 lexicographers are not linguists </li></ul><ul><ul><li>A self-indulgent belief </li></ul></ul><...
Conclusions (2) <ul><li>Evidence </li></ul><ul><ul><li>Corpus shows word usage, both regular and irregular </li></ul></ul>...
Upcoming SlideShare
Loading in...5
×

Compiling a Monolingual Dictionary for Native Speakers

2,084

Published on

Published in: Education
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
2,084
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
36
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Compiling a Monolingual Dictionary for Native Speakers

  1. 1. Compiling a Monolingual Dictionary for Native Speakers <ul><li>Patrick Hanks </li></ul><ul><li>Formerly chief editor, Current English Dictionaries, Oxford University Press; </li></ul><ul><li>Editor, Collins English Dictionary ; managing editor, Cobuild (1 st edition). </li></ul><ul><li>Ljubljana </li></ul><ul><li>February 6, 2009 </li></ul>
  2. 2. Talk Outline <ul><li>What is a lexical database? What is a dictionary? </li></ul><ul><li>L1 dictionaries and their users </li></ul><ul><li>Words and their histories </li></ul><ul><li>Research: getting the words in </li></ul><ul><li>Macrostructure: the lexical item </li></ul><ul><ul><li>I ncluding multiword expressions </li></ul></ul><ul><li>Microstructure: </li></ul><ul><ul><li>Lemma, pronunciation, meaning, use, ... </li></ul></ul><ul><li>The future of L1 dictionaries </li></ul><ul><ul><li>Print? CD-Rom? On-line? Hypertext links? </li></ul></ul>
  3. 3. A Lexical Database <ul><li>A lexical database is a summary of the evidence – corpus evidence – for each word in the language </li></ul><ul><li>The focus is typically on syntagmatics and collocations </li></ul><ul><ul><li>also on lemmatization, morphology, and meaning </li></ul></ul><ul><li>A primary resource for many applications </li></ul><ul><ul><li>Dictionary writing </li></ul></ul><ul><ul><li>Course-book writing </li></ul></ul><ul><ul><li>Education and error correction </li></ul></ul><ul><ul><li>Natural language programming and artificial intelligence </li></ul></ul><ul><ul><li>Codifying the relative importance of each sense of a word </li></ul></ul><ul><li>A lexical database should be scientifically, empirically well founded on on a firm basis evidence </li></ul><ul><ul><li>It is not a place for the pontifications of self-appointed pundits </li></ul></ul>
  4. 4. The social function of L1 dictionaries <ul><li>A great monolingual dictionary has a socially unifying function </li></ul><ul><ul><li>providing information about the conventions which speakers and hearers (writers and readers) rely on for mutual understanding </li></ul></ul><ul><li>It should explain meaning and usage clearly </li></ul><ul><ul><li>with a focus on explaining as well as defining </li></ul></ul><ul><li>It should report controversial issues about ‘correct’ usage and evaluate them objectively </li></ul><ul><ul><li>accurately reporting differences of dialect and register </li></ul></ul><ul><li>It should aim, inter alia, to reassure people who fear that their language is under threat </li></ul><ul><ul><li>but not make unnecessary concessions to irrational arguments about norms </li></ul></ul>
  5. 5. Typology of L1 English dictionaries <ul><li>British: </li></ul><ul><ul><li>Historical principles: Oxford English Dictionary [multivolume] </li></ul></ul><ul><ul><li>Synchronic principles: Collins, Chambers, (N)ODE [each is 1 volume] </li></ul></ul><ul><li>American: </li></ul><ul><ul><li>Historical principles: Merriam Webster’s Unabridged [multivolume], Merriam Webster’s Collegiate [1 vol.], </li></ul></ul><ul><ul><li>Synchronic principles: American Heritage </li></ul></ul><ul><li>Australian: Macquarie ( synchronic principles ) </li></ul>
  6. 6. What’s the difference between ‘historical principles’ and ‘synchronic principles’? <ul><li>Historical principles place the earliest meaning of a word first </li></ul><ul><ul><li>camera, noun [Latin camera ‘vaulted room’] 1686. 1. a small room. 2. the treasury of the papal curia. 3. a darkened box or room with a screen in it, onto which an image is projected ( camera obscura ).... 4. an apparatus for taking photographs or making films. … </li></ul></ul><ul><li>Synchronic principles place the current meaning first. </li></ul><ul><ul><li>camera, noun. an apparatus for taking photographs or making films. [from Latin camera ‘small room’] </li></ul></ul><ul><ul><li>camera obscura, noun. a darkened box or room with a screen in it, onto which an image is projected. ... [Latin: ‘dark room’] </li></ul></ul>
  7. 7. The instability of word meaning <ul><ul><li>The synchronic/historical distinction affects many words. </li></ul></ul><ul><ul><ul><li>field: enclosed land. [Old English feld ‘open country’] </li></ul></ul></ul><ul><ul><ul><li>gay: homosexual. [meant ‘cheerful’ until about 1965] </li></ul></ul></ul><ul><ul><ul><li>intercourse: sex act. [meant ‘conversation’ until C20] </li></ul></ul></ul><ul><ul><ul><li>kind: considerate and friendly. [Old English: ‘noble, well-bred’] </li></ul></ul></ul><ul><ul><ul><li>magazine: 1. periodical publication. 2. holder for cartridges on a gun or revolver. [Arabic: ‘storehouse’ ] </li></ul></ul></ul><ul><ul><ul><li>sock . [Latin soccus ‘light shoe worn by a comic actor’] </li></ul></ul></ul><ul><ul><ul><li>size: dimension, magnitude. [from assizes ‘session of a local law court’: a size loaf was a loaf of court-approved dimensions] </li></ul></ul></ul><ul><ul><li>Today’s exploitation may become tomorrow’s norm. </li></ul></ul>
  8. 8. Word histories <ul><li>Modern British and American dictionaries – even dictionaries on synchronic principles – have a C19 model of word history </li></ul><ul><li>They tell the semantic development – how each word developed its modern meaning(s) – including changes that took place in the LI – as well as the morphological development of etymons since IE </li></ul><ul><li>Also, discuss cognates (not just false friends), semantic equivalents, fossilized metaphors, and the origins of idioms: </li></ul><ul><ul><li>English magazine . French magasin </li></ul></ul><ul><ul><li>English crane , French grue , Czech jeřáb </li></ul></ul><ul><ul><li>kick the bucket, keep one’s head above water </li></ul></ul>
  9. 9. Getting the words in (1) <ul><li>Building on existing dictionaries </li></ul><ul><ul><li>Lexicography is accretive </li></ul></ul><ul><ul><li>Danger of mindlessly copying errors and out-of-date information </li></ul></ul><ul><li>The Oxford reading program: </li></ul><ul><ul><li>150 years of rese ar ch to find millions of citations </li></ul></ul><ul><ul><li>B ut not a balanced corpus </li></ul></ul><ul><li>Directed reading research – specialist areas </li></ul><ul><li>Searching corpus data: </li></ul><ul><ul><li>low yield for new words </li></ul></ul><ul><ul><li>h igh yield for phraseology, collocation, usage </li></ul></ul><ul><li>Trawling the internet. Problems: </li></ul><ul><ul><li>sorting the “new words” from the rubbish </li></ul></ul><ul><ul><li>many “new words” are in fact multiword expressions </li></ul></ul><ul><ul><li>They are hard to find by web crawling programs </li></ul></ul>
  10. 10. Getting the words in (2) <ul><li>Building on existing dictionaries </li></ul><ul><ul><li>Lexicography is accretive </li></ul></ul><ul><ul><li>Danger of mindlessly copying error s and out-of –date information </li></ul></ul><ul><ul><li>How to keep the lexicographers awake? </li></ul></ul><ul><li>The Oxford reading program: huge expense </li></ul><ul><li>Directed reading research – specialist areas </li></ul><ul><li>Searching corpus data: low yield </li></ul><ul><li>Trawling the internet. Problems: </li></ul><ul><ul><li>sorting the “new words” from the rubbish </li></ul></ul><ul><ul><li>many “new words” are in fact multiword expressions </li></ul></ul>
  11. 11. Why do people want a dictionary of their native language? <ul><li>There are no good recent studies of L1 dictionary use in English </li></ul><ul><li>Academic studies of dictionary use are mostly of bilingual and foreign learners ’ dictionaries: </li></ul><ul><ul><li>e.g. Atkins and Varantola (1997) studied dictionary use in translation tasks and language learning, but not native speaker use </li></ul></ul><ul><li>L1 and L2 dictionaries are quite different </li></ul><ul><ul><li>Foreign learners want to know what every native speaker knows already </li></ul></ul><ul><ul><li>Native speakers have a much broader spectrum of needs — peripheral, not central usage </li></ul></ul>
  12. 12. Informal feedback from marketing departments (1) <ul><li>People use an L1 dictionary mainly: </li></ul><ul><ul><li>for correct spelling (English is problematic) </li></ul></ul><ul><ul><ul><li>In Slovenian, maybe for correct morphology? </li></ul></ul></ul><ul><ul><li>for guidance on correct usage and word choice, e.g. </li></ul></ul><ul><ul><ul><li>‘ uninterested’ vs. ‘disinterested’, ‘refute’ vs. ‘deny’; ‘bored with’ or ‘bored of’ </li></ul></ul></ul><ul><ul><ul><li>Is it wrong to split an infinitive (e.g. ‘to boldly go’) ? </li></ul></ul></ul><ul><ul><li>for instant cultural reference information, e.g. </li></ul></ul><ul><ul><ul><li>“ What’s the scientific name for a thrush ?” </li></ul></ul></ul><ul><ul><ul><li>“ Is your scapula your collarbone or your shoulder blade?” </li></ul></ul></ul><ul><ul><ul><li>“ What’s the capital of Chile?” </li></ul></ul></ul><ul><ul><li>for browsing, e.g. “Why is a madrigal called a madrigal?” </li></ul></ul>
  13. 13. Informal feedback from marketing departments (2) <ul><li>An L1 dictionary is also used: </li></ul><ul><ul><li>a s a source of information about rare words and senses </li></ul></ul><ul><ul><ul><li>“ What does nook-shotten [in Shakespeare] mean? What is a predator , and can you use it to describe a person? Is a penguin a predator? What are chinos ? What is an ohm ? W hat is a joule , and why is it so called?” </li></ul></ul></ul><ul><ul><li>for word games (e.g Scrabble) : “Is aa an English word?” </li></ul></ul><ul><li>People want to have an authoritative inventory of their language, even if (in practice) they never look at it </li></ul><ul><ul><li>They also want ‘fun words’ – e.g. cutpurse, mosstrooper, yegg, snakehead, tsotsi, rudeboy, grifter (various criminals) </li></ul></ul><ul><ul><li>A nd ‘new words’ – which provide journalistic copy </li></ul></ul>
  14. 14. The role of corpus data <ul><li>Corpora show how each word is used </li></ul><ul><ul><li>providing an essential source of information for collocations and syntagmatics (studied statistically) </li></ul></ul><ul><ul><li>a framework, a solid empirical foundation for a dictionary </li></ul></ul><ul><ul><li>but don’t stop there! </li></ul></ul><ul><li>Other kinds of information must be slotted into this framework, e.g. </li></ul><ul><ul><li>Etymologies and word histories </li></ul></ul><ul><ul><li>Guidance on ‘correct’ usage </li></ul></ul><ul><ul><li>Scientific and technical definitions </li></ul></ul><ul><ul><li>Consistency of sets (e.g. ‘all’ the terminology of cricket) </li></ul></ul><ul><li>A corpus cannot be the only source for lexical data </li></ul><ul><ul><li>Lexicographers reading newspapers, watching TV, note how things are said (the words used), not what is said (content of the message) </li></ul></ul>
  15. 15. The dictionary as inventory <ul><li>An L1 dictionary should contain “all the words in the language” </li></ul><ul><ul><li>but is this possible? The lexicon is constantly growing </li></ul></ul><ul><li>and “all the meanings of each word” </li></ul><ul><ul><li>but word meaning is imprecise and fluid, not fixed </li></ul></ul><ul><li>guidance on how each word is used (syntagmatics) </li></ul><ul><ul><li>By examples of usage, rather than by abstract formulations in the technical language of linguistics </li></ul></ul><ul><ul><li>Dictionaries are for people, not for linguists! </li></ul></ul>
  16. 16. Researching lexical items: collecting evidence <ul><li>Building on existing dictionaries </li></ul><ul><ul><li>Lexicography is accretive </li></ul></ul><ul><ul><li>Danger of mindlessly copying error s and out-of –date information </li></ul></ul><ul><ul><li>How to keep the lexicographers awake? </li></ul></ul><ul><li>The Oxford reading program: huge expense </li></ul><ul><li>Directed reading research – specialist areas </li></ul><ul><li>Searching corpus data: low yield for new words </li></ul><ul><li>Trawling the internet. Problems: </li></ul><ul><ul><li>sorting the “new words” from the rubbish </li></ul></ul><ul><ul><li>M any so-called “new words” are in fact multiword expressions </li></ul></ul>
  17. 17. Some 2006 new words: from Macmillan English Dictionary <ul><li>blogosphere , noun. the imaginary place on the Internet where people’s blogs go so that other people can read them and react to them: software that tracks mood swings across the ‘blogosphere’ and pinpoints the events behind them ... </li></ul><ul><li>chav, noun. someone, especially a working-class person who is not well educated, dresses in designer clothes and wears a lot of gold jewellery but whose appearance shows bad taste. </li></ul><ul><li>air kiss, career gapper, Chelsea tractor, chick lit, civil partnership, designer baby, green audit, hissy fit, intelligent design </li></ul>
  18. 18. Even Homer nods (especially when copying) <ul><li>dord, n. density . </li></ul><ul><ul><li>actually copied from another dictionary </li></ul></ul><ul><ul><ul><li>“ D . or d . density.” </li></ul></ul></ul><ul><ul><li>Example from an American dictionary of the 1960s, cited by David Crystal </li></ul></ul><ul><li>intercourse, noun. 1. communication or dealings between individuals or groups: everyday social intercourse . 2. short for SEXUAL INTERCOURSE . </li></ul><ul><ul><li>NODE (1998, ODE 2005) </li></ul></ul><ul><ul><li>S ense 2 is the usual sense of the modern word; it should be the main definition, not a mere cross-reference. </li></ul></ul>
  19. 19. Terminology of special fields <ul><li>Science, technology, sports, pastimes, slang: </li></ul><ul><ul><li>How far should an L1 dictionary go in covering these? </li></ul></ul><ul><li>strobila, strobilus, strobilation, googly, chav </li></ul><ul><ul><li>chav is a coelacanth among slang words: very ancient, but only recently discovered. The etymology is Romany </li></ul></ul><ul><li>Native speakers who do not know these words rightly expect to find them in a dictionary. </li></ul><ul><li>But a dictionary is not a term bank. </li></ul>
  20. 20. L1 dictionary macrostructure <ul><li>The lexical item </li></ul><ul><ul><li>words </li></ul></ul><ul><ul><li>multiword expressions </li></ul></ul><ul><ul><li>idioms and phrasal verbs </li></ul></ul><ul><ul><ul><li>where to put them? E.g. bite the dust – at dust or bite ? </li></ul></ul></ul><ul><ul><li>prefixes and suffixes; combining forms </li></ul></ul><ul><ul><ul><li>e.g. un-, -ation, -oholic, brachy-, -algia </li></ul></ul></ul><ul><ul><li>abbreviations? </li></ul></ul><ul><ul><li>names? </li></ul></ul>
  21. 21. Microstructure <ul><ul><li>Lemma (inflected forms) </li></ul></ul><ul><ul><li>Pronunciation </li></ul></ul><ul><ul><li>Wordclass and subcategorization </li></ul></ul><ul><ul><li>Selectional preferences and phraseology </li></ul></ul><ul><ul><li>Syntax and syntagmatics </li></ul></ul><ul><ul><li>definitions </li></ul></ul><ul><ul><li>Guidance on correct usage </li></ul></ul><ul><ul><li>Etymology and word histories </li></ul></ul>
  22. 22. The lemma <ul><li>strong, stronger, strongest </li></ul><ul><ul><li>strongly </li></ul></ul><ul><li>strength </li></ul><ul><li>strengthen </li></ul><ul><li>emblazon (but emblazoned is 100 times commoner) </li></ul><ul><li>frightened, frightening (forms of the verb, or adjectives in their own right?) </li></ul>
  23. 23. Pronunciation <ul><li>Should a printed LI dictionary text give guidance on pronunciation at all? </li></ul><ul><ul><li>More useful in English than in Slovenian? </li></ul></ul><ul><li>Use the International Phonetic Alphabet or some sort of spelling-rewrite system? </li></ul><ul><li>Why give pronunciations only for headwords? Why not also for inflections? </li></ul><ul><li>An electronic product can be multimedia, so hypertext links to a spoken representation seems an obvious answer </li></ul><ul><ul><li>But in which dialect? </li></ul></ul>
  24. 24. Dictionary definitions <ul><li>What is a word meaning? Does it exist? </li></ul><ul><ul><li>“ A text is a unique deployment of meaningful units, and its particular meaning is not adequately accounted for by any organized concatenation of the fixed meanings of each unit. This is because some aspects of textual meaning arise from the particular combination of choices ...” – J. Sinclair 2004 </li></ul></ul><ul><ul><li>Not least because the meaning of each unit is not fixed! </li></ul></ul><ul><ul><li>Dictionaries can’t account for everything in the meaning of a text. But they can account for some things. (An elephant is not a toothpick .) </li></ul></ul>
  25. 25. Writing definitions of technical terms <ul><li>Stipulations by scientific committees and other classifying systems </li></ul><ul><li>Stipulations, not natural language! </li></ul><ul><li>Need both </li></ul><ul><li>Examples: second, spider </li></ul><ul><li>Interface between the lexicographer and the scientist (the user of the term) </li></ul>
  26. 26. technical definitions (1) <ul><li>second, noun. a sixtieth of a minute of time, which as the SI unit of time is defined in terms of the natural periodicity of the radiation of a caesium-133 atom. </li></ul><ul><ul><ul><li>informal a very short time: his eyes met Charlotte’s for a second. </li></ul></ul></ul><ul><li>(N)ODE </li></ul>
  27. 27. technical definitions (2) <ul><li>spider : an eight-legged predatory arachnid with an unsegmented body consisting of a fused head and thorax and a rounded abdomen. Spiders have fangs which inject poison into their prey, and most kinds spin webs in which to capture insects. </li></ul><ul><li>Order: Araneae, class: Arachnida. </li></ul><ul><li>(N)ODE </li></ul>
  28. 28. Word sketch for ‘spider’ <ul><li>object_of : 134 1.5 catch  9 15 3.93 watch  6 eat 4 3.43 find  8 29 0.89 put  4 see  9 get  8 come 5 0.33 </li></ul><ul><li>subject_of : 137 3.0 scuttle 3 7.86 crawl 4 6.76 spin 4 6.02 climb 10 5.83 bite 3 5.1 feed 3 3.65 wait 3 2.61 live  4 20 2.22 run  6 go  6 come  4 </li></ul><ul><li>a_modifier : 211 1.8 trap-door 4 9.24 bird-eating 3 8.84 tarantula 3 8.82 jumping 4 8.64 sedentary 3 8.11 poisonous 4 7.61 giant 12 7.44 hairy 3 7.2 gigantic 3 7.19 tiny  8 57 5.4 black  18 huge  3 white  6 great  9 large  6 little  3 small  4 female 4 4.52 </li></ul><ul><li>n_modifier : 132 1.3 Insy  14 28 11.4 Winsy  14 bola 4 9.9 orb 4 8.66 raft 9 8.56 fen 5 7.51 crab 4 7.05 widow 11 6.83 wolf 4 6.72 hunting 4 5.65 forest 4 3.23 sea 3 2.76 house 5 0.73 </li></ul><ul><li>modifies : 158 0.7 mite 17 9.65 catcher 3 8.29 monkey 15 8.18 web 12 8.03 venom 4 7.78 crab 5 7.33 rider 10 6.71 climb 4 6.39 silk 5 5.52 leg 4 2.64 affair 3 2.28 plant  3 15 1.89 woman  4 family  3 system  5 </li></ul><ul><li>and/or : 219 1.8 scorpion 11 9.62 cockroach 3 7.94 beetle  8 25 7.85 insect  12 fly  5 caterpillar 5 7.79 octopus 3 7.66 boar 5 7.41 crab 5 7.25 wolf 6 7.21 web 7 7.19 mite 3 7.03 spider  6 12 6.89 snake  6 bug 3 6.46 bird 5 3.55 </li></ul>
  29. 29. Corpus-based profile for ‘spider’ <ul><li>Many thousands of species of spiders are known ( funnel-web, web-building, orb-weaving, bird-eating, ground-dwelling, giant, huge, large, tiny, poisonous, black widow, camel, redback, trapdoor, wolf, whitetail, crab. tarantula , etc.). </li></ul><ul><li>Some species of spiders hunt prey . </li></ul><ul><li>Spiders bite . </li></ul><ul><li>Some species of spiders are poisonous . </li></ul><ul><li>Many species of spiders spin webs , with threads of strong silk . </li></ul><ul><li>Spiders lurk in the centre of their webs . </li></ul><ul><li>Spiders control what is going on in their webs . </li></ul><ul><li>Spiders have eight legs . </li></ul><ul><li>Their legs are thin , hairy, and long in proportion to body size. </li></ul><ul><li>Spiders have eight eyes . </li></ul><ul><li>Spiders spend a lot of time being motionless. </li></ul><ul><li>Spiders’ movement is sudden . </li></ul><ul><li>Spiders crawl . </li></ul><ul><li>Spiders scuttle . </li></ul><ul><li>Spiders are swift and agile . </li></ul><ul><li>Spiders can run up walls . </li></ul><ul><li>Many people have a dread of ( hate ) spiders. </li></ul><ul><li>People kill spiders. </li></ul><ul><li>English people are much concerned with trying to get spiders out of the bath . </li></ul>
  30. 30. The virtues of brevity <ul><li>Avoid verbosity! </li></ul><ul><li>Even if in the dictionary of the future space is unlimited, dictionary entries should be brief, concise, and to the point. </li></ul><ul><ul><li>Lumping and splitting </li></ul></ul><ul><ul><li>Ockham’s razor </li></ul></ul><ul><ul><li>Menu-driven hierarchies of information </li></ul></ul>
  31. 31. Lexical syntagmatics <ul><li>Convention: </li></ul><ul><li>A dictionary can show the relations between typical, normal phraseology and typical, normal meaning, e.g.: </li></ul><ul><li>frighten, verb. </li></ul><ul><ul><li>Something frightens a person or animal = cause to feel fear </li></ul></ul><ul><ul><li>.. frighten someone off /away </li></ul></ul><ul><ul><li>.. frighten someone into doing something </li></ul></ul><ul><ul><li>.. frighten the children upstairs into bed </li></ul></ul><ul><ul><li>.. frighten someone out of their skin/wits </li></ul></ul><ul><ul><li>.. frighten the life (living daylight) out of someone </li></ul></ul>
  32. 32. Selecting examples of usage <ul><li>No invented examples! </li></ul><ul><ul><li>Intuitions and usage are inverse variables. </li></ul></ul><ul><ul><li>Plenty of corpus evidence to choose from. </li></ul></ul><ul><ul><li>Beware of distortion through shortening </li></ul></ul><ul><li>Choose natural, normal examples, not boundary cases. </li></ul>
  33. 33. The need to get on with it! <ul><li>The lexicon of a language is large. Dictionary compilation is a huge task. </li></ul><ul><li>The editor must make policy decisions and everyone must stick to them </li></ul><ul><ul><li>There is no time for agonizing. </li></ul></ul><ul><ul><li>Anyway, agonizing is often counterproductive. </li></ul></ul><ul><li>When compiling, compilers should “do their honest best”. </li></ul><ul><ul><li>A system must be set up for spotting obvious errors and accidental infelicities of wording </li></ul></ul><ul><ul><li>Lexicographers read and check each other’s work </li></ul></ul>
  34. 34. The future of L1 dictionaries <ul><li>The medium: </li></ul><ul><ul><li>Print? CD-Rom? On-line? </li></ul></ul><ul><ul><li>On-line dictionaries of the future will be locations that summarize and interface </li></ul></ul><ul><ul><li>Menu-driven information hierarchies </li></ul></ul><ul><li>The message: </li></ul><ul><ul><li>Hypertext links to pre-processed corpus evidence, a grammar, an encyclopedia, other reference sources, other data of all kinds: the dictionary will a) summarize b) typify , and b) interface </li></ul></ul><ul><ul><li>Corpus-based syntagmatics ( dogs bark, wolves howl; lions roar, cats miaow ) </li></ul></ul><ul><ul><li>Multimedia (sound, photos, film clips. Smell? taste? touch?) </li></ul></ul><ul><ul><li>Links to scientific taxonomies, e.g. Linnaean classification of flora and fauna </li></ul></ul>
  35. 35. Conclusions (1) <ul><li>L1 lexicographers are not linguists </li></ul><ul><ul><li>A self-indulgent belief </li></ul></ul><ul><ul><li>Linguistics is fatal for good lexicography </li></ul></ul><ul><ul><li>Lexicographers should know a bit about linguistics, but they need to know about a lot of other things too </li></ul></ul><ul><ul><li>Lexicography is a team game. “Renaissance man” is dead (as far as dictionary writing is concerned) </li></ul></ul><ul><li>What are they, then? </li></ul><ul><ul><li>Inventory clerks? Public servants? Cultural, social, and literary historians? Creative writers? Hack journalists? </li></ul></ul><ul><ul><li>All of these and more. </li></ul></ul><ul><ul><li>A lexicographer is a lexicographer! </li></ul></ul>
  36. 36. Conclusions (2) <ul><li>Evidence </li></ul><ul><ul><li>Corpus shows word usage, both regular and irregular </li></ul></ul><ul><ul><li>Other research is needed for terminology, names, word histories, and attitudes to the ‘correctness’ of controversial expressions </li></ul></ul><ul><li>Interpretation </li></ul><ul><ul><li>Definitions should explain, not merely define </li></ul></ul><ul><ul><li>Authoritative pronouncements must be based on evidence, not merely opinion </li></ul></ul><ul><ul><li>But public attitudes to ‘correctness’ need to be reported objectively as well as evaluated </li></ul></ul><ul><ul><li>Explain all normal, central uses and meanings </li></ul></ul><ul><ul><li>Don’t try to cover all possibilities! </li></ul></ul><ul><ul><ul><li>If you do, the language will defeat you, for word meaning and use is infinitely flexible </li></ul></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×