Transformational grammarFrom Wikipedia, the free encyclopediaIn linguistics, a transformational grammar or transformational-generative grammar (TGG)is a generative grammar, especially of a natural language, that has been developed in aChomskyan tradition. Additionally, transformational grammar is the Chomskyan tradition thatgives rise to specific transformational grammars. Much current research in transformationalgrammar is inspired by Chomskys Minimalist Program.Contents[hide] 1 Deep structure and surface structure 2 Development of basic concepts 3 Innate linguistic knowledge 4 Grammatical theories 5 "I-Language" and "E-Language" 6 Grammaticality 7 Minimalism 8 Mathematical representation 9 Transformations 10 See also 11 References 12 External links Deep structure and surface structure Linguistics Theoretical linguistics Cognitive linguistics
Generative linguisticsFunctional theories of grammar Quantitative linguistics Phonology · Graphemics Morphology · Syntax · Lexis Semantics · Pragmatics Descriptive linguistics Anthropological linguistics Comparative linguistics Historical linguistics Phonetics · Graphetics Etymology · Sociolinguistics Applied and experimental linguistics Computational linguistics Evolutionary linguistics Forensic linguistics Internet linguistics Language acquisition Language assessment Language development Language education Linguistic anthropology Neurolinguistics Psycholinguistics Second language acquisition Related articles History of linguistics Linguistic prescription
List of linguists List of unsolved problems in linguistics Portal v·d·eIn 1957, Noam Chomsky published Syntactic Structures, in which he developed the idea thateach sentence in a language has two levels of representation — a deep structure and a surfacestructure. The deep structure represented the core semantic relations of a sentence, and wasmapped on to the surface structure (which followed the phonological form of the sentence veryclosely) via transformations. Chomsky believed there are considerable similarities betweenlanguages deep structures, and that these structures reveal properties, common to all languagesthat surface structures conceal. However, this may not have been the central motivation forintroducing deep structure. Transformations had been proposed prior to the development of deepstructure as a means of increasing the mathematical and descriptive power of context-freegrammars. Similarly, deep structure was devised largely for technical reasons relating to earlysemantic theory. Chomsky emphasizes the importance of modern formal mathematical devices inthe development of grammatical theory:But the fundamental reason for [the] inadequacy of traditional grammars is a more technical one.Although it was well understood that linguistic processes are in some sense "creative," thetechnical devices for expressing a system of recursive processes were simply not available untilmuch more recently. In fact, a real understanding of how a language can (in Humboldts words)"make infinite use of finite means" has developed only within the last thirty years, in the courseof studies in the foundations of mathematics.—Aspects of the Theory of Syntax Development of basic conceptsThough transformations continue to be important in Chomskys current theories, he has nowabandoned the original notion of Deep Structure and Surface Structure. Initially, two additionallevels of representation were introduced (LF — Logical Form, and PF — Phonetic Form), andthen in the 1990s Chomsky sketched out a new program of research known as Minimalism, inwhich Deep Structure and Surface Structure no longer featured and PF and LF remained as theonly levels of representation.To complicate the understanding of the development of Noam Chomskys theories, the precisemeanings of Deep Structure and Surface Structure have changed over time — by the 1970s, thetwo were normally referred to simply as D-Structure and S-Structure by Chomskyan linguists. In
particular, the idea that the meaning of a sentence was determined by its Deep Structure (taken toits logical conclusions by the generative semanticists during the same period) was dropped forgood by Chomskyan linguists when LF took over this role (previously, Chomsky and RayJackendoff had begun to argue that meaning was determined by both Deep and SurfaceStructure). Innate linguistic knowledgeTerms such as "transformation" can give the impression that theories of transformationalgenerative grammar are intended as a model for the processes through which the human mindconstructs and understands sentences. Chomsky is clear that this is not in fact the case: agenerative grammar models only the knowledge that underlies the human ability to speak andunderstand. One of the most important of Chomskys ideas is that most of this knowledge isinnate, with the result that a baby can have a large body of prior knowledge about the structure oflanguage in general, and need only actually learn the idiosyncratic features of the language(s) itis exposed to. Chomsky was not the first person to suggest that all languages had certainfundamental things in common (he quotes philosophers writing several centuries ago who hadthe same basic idea), but he helped to make the innateness theory respectable after a perioddominated by more behaviorist attitudes towards language. Perhaps more significantly, he madeconcrete and technically sophisticated proposals about the structure of language, and madeimportant proposals regarding how the success of grammatical theories should be evaluated. Grammatical theoriesIn the 1960s, Chomsky introduced two central ideas relevant to the construction and evaluationof grammatical theories. The first was the distinction between competence and performance.Chomsky noted the obvious fact that people, when speaking in the real world, often makelinguistic errors (e.g., starting a sentence and then abandoning it midway through). He arguedthat these errors in linguistic performance were irrelevant to the study of linguistic competence(the knowledge that allows people to construct and understand grammatical sentences).Consequently, the linguist can study an idealised version of language, greatly simplifyinglinguistic analysis (see the "Grammaticality" section below). The second idea related directly tothe evaluation of theories of grammar. Chomsky distinguished between grammars that achievedescriptive adequacy and those that go further and achieved explanatory adequacy. Adescriptively adequate grammar for a particular language defines the (infinite) set of grammaticalsentences in that language; that is, it describes the language in its entirety. A grammar thatachieves explanatory adequacy has the additional property that it gives an insight into theunderlying linguistic structures in the human mind; that is, it does not merely describe thegrammar of a language, but makes predictions about how linguistic knowledge is mentallyrepresented. For Chomsky, the nature of such mental representations is largely innate, so if agrammatical theory has explanatory adequacy it must be able to explain the various grammaticalnuances of the languages of the world as relatively minor variations in the universal pattern ofhuman language. Chomsky argued that, even though linguists were still a long way fromconstructing descriptively adequate grammars, progress in terms of descriptive adequacy willonly come if linguists hold explanatory adequacy as their goal. In other words, real insight into
the structure of individual languages can only be gained through comparative study of a widerange of languages, on the assumption that they are all cut from the same cloth. "I-Language" and "E-Language"In 1986, Chomsky proposed a distinction between I-Language and E-Language, similar but notidentical to the competence/performance distinction. (I-language) refers to Internal languageand is contrasted with External Language (or E-language). I-Language is taken to be the objectof study in linguistic theory; it is the mentally represented linguistic knowledge that a nativespeaker of a language has, and is therefore a mental object — from this perspective, most oftheoretical linguistics is a branch of psychology. E-Language encompasses all other notions ofwhat a language is, for example that it is a body of knowledge or behavioural habits shared by acommunity. Thus, E-Language is not itself a coherent concept, and Chomsky argues that suchnotions of language are not useful in the study of innate linguistic knowledge, i.e., competence,even though they may seem sensible and intuitive, and useful in other areas of study.Competence, he argues, can only be studied if languages are treated as mental objects. GrammaticalityFurther information: GrammaticalityChomsky argued that the notions "grammatical" and "ungrammatical" could be defined in ameaningful and useful way. In contrast, an extreme behaviorist linguist would argue thatlanguage can only be studied through recordings or transcriptions of actual speech, the role of thelinguist being to look for patterns in such observed speech, but not to hypothesize about whysuch patterns might occur, nor to label particular utterances as either "grammatical" or"ungrammatical." Although few linguists in the 1950s actually took such an extreme position,Chomsky was at an opposite extreme, defining grammaticality in an unusually mentalistic way(for the time). He argued that the intuition of a native speaker is enough to define thegrammaticalness of a sentence; that is, if a particular string of English words elicits a doubletake, or feeling of wrongness in a native English speaker, and when various extraneous factorsaffecting intuitions are controlled for, it can be said that the string of words is ungrammatical.This, according to Chomsky, is entirely distinct from the question of whether a sentence ismeaningful, or can be understood. It is possible for a sentence to be both grammatical andmeaningless, as in Chomskys famous example "colorless green ideas sleep furiously." But suchsentences manifest a linguistic problem distinct from that posed by meaningful butungrammatical (non)-sentences such as "man the bit sandwich the," the meaning of which isfairly clear, but no native speaker would accept as well formed.The use of such intuitive judgments permitted generative syntacticians to base their research on amethodology in which studying language through a corpus of observed speech becamedownplayed, since the grammatical properties of constructed sentences were considered to beappropriate data to build a grammatical model on. Minimalism
Main article: Minimalist programFrom the mid-1990s onwards, much research in transformational grammar has been inspired byChomskys Minimalist Program. The "Minimalist Program" aims at the further development ofideas involving economy of derivation and economy of representation, which had started tobecome significant in the early 1990s, but were still rather peripheral aspects ofTransformational-generative grammar theory. Economy of derivation is a principle stating that movements (i.e., transformations) only occur in order to match interpretable features with uninterpretable features. An example of an interpretable feature is the plural inflection on regular English nouns, e.g., dogs. The word dogs can only be used to refer to several dogs, not a single dog, and so this inflection contributes to meaning, making it interpretable. English verbs are inflected according to the number of their subject (e.g., "Dogs bite" vs "A dog bites"), but in most sentences this inflection just duplicates the information about number that the subject noun already has, and it is therefore uninterpretable. Economy of representation is the principle that grammatical structures must exist for a purpose, i.e., the structure of a sentence should be no larger or more complex than required to satisfy constraints on grammaticality.Both notions, as described here, are somewhat vague, and indeed the precise formulation of theseprinciples is controversial. An additional aspect of minimalist thought is the idea that thederivation of syntactic structures should be uniform; that is, rules should not be stipulated asapplying at arbitrary points in a derivation, but instead apply throughout derivations. Minimalistapproaches to phrase structure have resulted in "Bare Phrase Structure," an attempt to eliminateX-bar theory. In 1998, Chomsky suggested that derivations proceed in phases. The distinction ofDeep Structure vs. Surface Structure is not present in Minimalist theories of syntax, and the mostrecent phase-based theories also eliminate LF and PF as unitary levels of representation. Mathematical representationReturning to the more general mathematical notion of a grammar, an important feature of alltransformational grammars is that they are more powerful than context-free grammars. Thisidea was formalized by Chomsky in the Chomsky hierarchy. Chomsky argued that it isimpossible to describe the structure of natural languages using context-free grammars. Hisgeneral position regarding the non-context-freeness of natural language has held up since then,although his specific examples regarding the inadequacy of CFGs in terms of their weakgenerative capacity were later disproven. TransformationsThe usual usage of the term transformation in linguistics refers to a rule that takes an inputtypically called the Deep Structure (in the Standard Theory) or D-structure (in the extendedstandard theory or government and binding theory) and changes it in some restricted way to
result in a Surface Structure (or S-structure). In TGG, Deep structures were generated by a set ofphrase structure rules.For example, a typical transformation in TG is the operation of subject-auxiliary inversion (SAI).This rule takes as its input a declarative sentence with an auxiliary: "John has eaten all theheirloom tomatoes." and transforms it into "Has John eaten all the heirloom tomatoes?" In theiroriginal formulation (Chomsky 1957), these rules were stated as rules that held over strings ofeither terminals or constituent symbols or both. X NP AUX Y X AUX NP Y(where NP = Noun Phrase and AUX = Auxiliary)In the 1970s, by the time of the Extended Standard Theory, following the work of JosephEmonds on structure preservation, transformations came to be viewed as holding over trees. Bythe end of government and binding theory in the late 1980s, transformations are no longerstructure changing operations at all; instead they add information to already existing trees bycopying constituents.The earliest conceptions of transformations were that they were construction-specific devices.For example, there was a transformation that turned active sentences into passive ones. Adifferent transformation raised embedded subjects into main clause subject position in sentencessuch as "John seems to have gone"; and yet a third reordered arguments in the dative alternation.With the shift from rules to principles and constraints that was found in the 1970s, theseconstruction-specific transformations morphed into general rules (all the examples justmentioned being instances of NP movement), which eventually changed into the single generalrule of move alpha or Move.Transformations actually come of two types: (i) the post-Deep structure kind mentioned above,which are string or structure changing, and (ii) Generalized Transformations (GTs). Generalizedtransformations were originally proposed in the earliest forms of generative grammar (e.g.,Chomsky 1957). They take small structures, either atomic or generated by other rules, andcombine them. For example, the generalized transformation of embedding would take the kernel"Dave said X" and the kernel "Dan likes smoking" and combine them into "Dave said Dan likessmoking." GTs are thus structure building rather than structure changing. In the ExtendedStandard Theory and government and binding theory, GTs were abandoned in favor of recursivephrase structure rules. However, they are still present in tree-adjoining grammar as theSubstitution and Adjunction operations, and they have recently re-emerged in mainstreamgenerative grammar in Minimalism, as the operations Merge and Move.In generative phonology, another form of transformation is the phonological rule, whichdescribes a mapping between an underlying representation (the phoneme) and the surface formthat is articulated during natural speech. See also
Antisymmetry Generalised phrase structure grammar Generative semantics Head-driven phrase structure grammar Heavy NP shift Lexical functional grammar Parasitic gap References 1. ^ Chomsky, Noam (1995). The Minimalist Program. MIT Press. 2. ^ Chomsky, Noam (1965). Aspects of the Theory of Syntax. MIT Press. ISBN 0262530074. 3. ^ The Port-Royal Grammar of 1660 identified similar principles; Chomsky, Noam (1972). Language and Mind. Harcourt Brace Jovanovich. ISBN 0151478104. 4. ^ Jackendoff, Ray (1974). Semantic Interpretation in Generative Grammar. MIT Press. ISBN 0262100134. 5. ^ May, Robert C. (1977). The Grammar of Quantification. MIT Phd Dissertation. ISBN 0824013921. (Supervised by Noam Chomsky, this dissertation introduced the idea of "logical form.") 6. ^ Chomsky, Noam (1986). Knowledge of Language. New York:Praeger. ISBN 0275900258. 7. ^ Chomsky, Noam (2001). "Derivation by Phase." In other words, in algebraic terms, the I-Language is the actual function, whereas the E-Language is the extension of this function. In Michael Kenstowicz (ed.) Ken Hale: A Life in Language. MIT Press. Pages 1-52. (See p. 49 fn. 2 for comment on E-Language.) 8. ^ Newmeyer, Frederick J. (1986). Linguistic Theory in America (Second Edition). Academic Press. 9. ^ Chomsky, Noam (1995). The Minimalist Program. MIT Press. ISBN 0262531283. 10. ^ Lappin, Shalom; Robert Levine and David Johnson (2000). "Topic ... Comment". Natural Language & Linguistic Theory 18 (3): 665–671. doi:10.1023/A:1006474128258. 11. ^ Lappin, Shalom; Robert Levine and David Johnson (2001). "The Revolution Maximally Confused". Natural Language & Linguistic Theory 19 (4): 901–919. doi:10.1023/A:1013397516214. 12. ^ Peters, Stanley; R. Ritchie (1973). "On the generative power of transformational grammars". Information Sciences 6: 49–83. doi:10.1016/0020-0255(73)90027-3. 13. ^ Chomsky, Noam (1956). "Three models for the description of language". IRE Transactions on Information Theory 2 (3): 113–124. doi:10.1109/TIT.1956.1056813. 14. ^ Shieber, Stuart (1985). "Evidence against the context-freeness of natural language". Linguistics and Philosophy 8 (3): 333–343. doi:10.1007/BF00630917. 15. ^ Pullum, Geoffrey K.; Gerald Gazdar (1982). "Natural languages and context-free languages". Linguistics and Philosophy 4 (4): 471–504. doi:10.1007/BF00360802. 16. ^ Goldsmith, John A (1995). "Phonological Theory". In John A. Goldsmith. The Handbook of Phonological Theory. Blackwell Handbooks in Linguistics. Blackwell Publishers. p. 2. ISBN 1405157682.
 External links What is I-language? - Chapter 1 of I-language: An Introduction to Linguistics as Cognitive Science. The Syntax of Natural Language – an online textbook on transformational grammar.Part-of-speech taggingFrom Wikipedia, the free encyclopedia (Redirected from Part of speech tagger)In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammaticaltagging or word-category disambiguation, is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, based on both its definition, as well asits context —i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.A simplified form of this is commonly taught to school-age children, in the identification ofwords as nouns, verbs, adjectives, adverbs, etc.Once performed by hand, POS tagging is now done in the context of computational linguistics,using algorithms which associate discrete terms, as well as hidden parts of speech, in accordancewith a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-basedand stochastic. E.Brills tagger, one of the first and widely used English POS-taggers employsrule based algorithms.Contents[hide] 1 Principle 2 History o 2.1 The Brown Corpus o 2.2 Use of Hidden Markov Models o 2.3 Dynamic Programming methods o 2.4 Unsupervised taggers o 2.5 Other taggers and methods 3 Issues 4 See also 5 References 6 ^ STAR POS-tagger 7 External links Principle
Part-of-speech tagging is harder than just having a list of words and their parts of speech,because some words can represent more than one part of speech at different times, and becausesome parts of speech are complex or unspoken. This is not rare—in natural languages (asopposed to many artificial languages), a large percentage of word-forms are ambiguous. Forexample, even "dogs", which is usually thought of as just a plural noun, can also be a verb: The sailor dogs the barmaid.Performing grammatical tagging will indicate that "dogs" is a verb, and not the more commonplural noun, since one of the words must be the main verb, and the noun reading is less likelyfollowing "sailor" (sailor !→ dogs). Semantic analysis can then extrapolate that "sailor" and"barmaid" implicate "dogs" as 1) in the nautical context (sailor→<verb>←barmaid) and 2) anaction applied to the object "barmaid" ([subject] dogs→barmaid). In this context, "dogs" is anautical term meaning "fastens (a watertight barmaid) securely; applies a dog to"."Dogged", on the other hand, can be either an adjective or a past-tense verb. Just which parts ofspeech a word can represent varies greatly.Native speakers of a language perform grammatical and semantic analysis innately, and thustrained linguists can identify the grammatical parts of speech to various fine degrees dependingon the tagging system. Schools commonly teach that there are 9 parts of speech in English: noun,verb, article, adjective, preposition, pronoun, adverb, conjunction, and interjection. However,there are clearly many more categories and sub-categories. For nouns, plural, possessive, andsingular forms can be distinguished. In many languages words are also marked for their "case"(role as subject, object, etc.), grammatical gender, and so on; while verbs are marked for tense,aspect, and other things.In part-of-speech tagging by computer, it is typical to distinguish from 50 to 150 separate partsof speech for English, for example, NN for singular common nouns, NNS for plural commonnouns, NP for singular proper nouns (see the POS tags used in the Brown Corpus). Work onstochastic methods for tagging Koine Greek (DeRose 1990) has used over 1,000 parts of speech,and found that about as many words were ambiguous there as in English. A morphosyntacticdescriptor in the case of morphologically rich languages can be expressed like Ncmsan, whichmeans Category=Noun, Type = common, Gender = masculine, Number = singular, Case =accusative, Animate = no. History The Brown CorpusResearch on part-of-speech tagging has been closely tied to corpus linguistics. The first majorcorpus of English for computer analysis was the Brown Corpus developed at Brown Universityby Henry Kucera and Nelson Francis, in the mid-1960s. It consists of about 1,000,000 words ofrunning English prose text, made up of 500 samples from randomly chosen publications. Eachsample is 2,000 or more words (ending at the first sentence-end after 2,000 words, so that thecorpus contains only complete sentences).
The Brown Corpus was painstakingly "tagged" with part-of-speech markers over many years. Afirst approximation was done with a program by Greene and Rubin, which consisted of a hugehandmade list of what categories could co-occur at all. For example, article then noun can occur,but article verb (arguably) cannot. The program got about 70% correct. Its results wererepeatedly reviewed and corrected by hand, and later users sent in errata, so that by the late 70sthe tagging was nearly perfect (allowing for some cases on which even human speakers mightnot agree).This corpus has been used for innumerable studies of word-frequency and of part-of-speech, andinspired the development of similar "tagged" corpora in many other languages. Statistics derivedby analyzing it formed the basis for most later part-of-speech tagging systems, such as CLAWS(linguistics) and VOLSUNGA. However, by this time (2005) it has been superseded by largercorpora such as the 100 million word British National Corpus.For some time, part-of-speech tagging was considered an inseparable part of natural languageprocessing, because there are certain cases where the correct part of speech cannot be decidedwithout understanding the semantics or even the pragmatics of the context. This is extremelyexpensive, especially because analyzing the higher levels is much harder when multiple part-of-speech possibilities must be considered for each word. Use of Hidden Markov ModelsIn the mid 1980s, researchers in Europe began to use hidden Markov models (HMMs) todisambiguate parts of speech, when working to tag the Lancaster-Oslo-Bergen Corpus of BritishEnglish. HMMs involve counting cases (such as from the Brown Corpus), and making a table ofthe probabilities of certain sequences. For example, once youve seen an article such as the,perhaps the next word is a noun 40% of the time, an adjective 40%, and a number 20%.Knowing this, a program can decide that "can" in "the can" is far more likely to be a noun than averb or a modal. The same method can of course be used to benefit from knowledge aboutfollowing words.More advanced ("higher order") HMMs learn the probabilities not only of pairs, but triples oreven larger sequences. So, for example, if youve just seen an article and a verb, the next itemmay be very likely a preposition, article, or noun, but much less likely another verb.When several ambiguous words occur together, the possibilities multiply. However, it is easy toenumerate every combination and to assign a relative probability to each one, by multiplyingtogether the probabilities of each choice in turn. The combination with highest probability is thenchosen. The European group developed CLAWS, a tagging program that did exactly this, andachieved accuracy in the 93-95% range.It is worth remembering, as Eugene Charniak points out in Statistical techniques for naturallanguage parsing , that merely assigning the most common tag to each known word and thetag "proper noun" to all unknowns, will approach 90% accuracy because many words areunambiguous.
CLAWS pioneered the field of HMM-based part of speech tagging, but was quite expensivesince it enumerated all possibilities. It sometimes had to resort to backup methods when therewere simply too many (the Brown Corpus contains a case with 17 ambiguous words in a row,and there are words such as "still" that can represent as many as 7 distinct parts of speech).HMMs underlie the functioning of stochastic taggers and are used in various algorithms one ofthe most widely used being the bi-directional inference algorithm . Dynamic Programming methodsIn 1987, Steven DeRose and Ken Church independently developed dynamic programmingalgorithms to solve the same problem in vastly less time. Their methods were similar to theViterbi algorithm known for some time in other fields. DeRose used a table of pairs, whileChurch used a table of triples and an ingenious method of estimating the values for triples thatwere rare or nonexistent in the Brown Corpus (actual measurement of triple probabilities wouldrequire a much larger corpus). Both methods achieved accuracy over 95%. DeRoses 1990dissertation at Brown University included analyses of the specific error types, probabilities, andother related data, and replicated his work for Greek, where it proved similarly effective.These findings were surprisingly disruptive to the field of natural language processing. Theaccuracy reported was higher than the typical accuracy of very sophisticated algorithms thatintegrated part of speech choice with many higher levels of linguistic analysis: syntax,morphology, semantics, and so on. CLAWS, DeRoses and Churchs methods did fail for some ofthe known cases where semantics is required, but those proved negligibly rare. This convincedmany in the field that part-of-speech tagging could usefully be separated out from the otherlevels of processing; this in turn simplified the theory and practice of computerized languageanalysis, and encouraged researchers to find ways to separate out other pieces as well. MarkovModels are now the standard method for part-of-speech assignment. Unsupervised taggersThe methods already discussed involve working from a pre-existing corpus to learn tagprobabilities. It is, however, also possible to bootstrap using "unsupervised" tagging.Unsupervised tagging techniques use an untagged corpus for their training data and produce thetagset by induction. That is, they observe patterns in word use, and derive part-of-speechcategories themselves. For example, statistics readily reveal that "the", "a", and "an" occur insimilar contexts, while "eat" occurs in very different ones. With sufficient iteration, similarityclasses of words emerge that are remarkably similar to those human linguists would expect; andthe differences themselves sometimes suggest valuable new insights.These two categories can be further subdivided into rule-based, stochastic, and neuralapproaches.
 Other taggers and methodsSome current major algorithms for part-of-speech tagging include the Viterbi algorithm, BrillTagger, Constraint Grammar, and the Baum-Welch algorithm (also known as the forward-backward algorithm). Hidden Markov model and visible Markov model taggers can both beimplemented using the Viterbi algorithm.Many machine learning methods have also been applied to the problem of POS tagging. Methodssuch as SVM, Maximum entropy classifier, Perceptron, and Nearest-neighbor have all been tried,and most can achieve accuracy above 95%.A direct comparison of several methods is reported (with references) at . This comparisonuses the Penn tag set on some of the Penn Treebank data, so the results are directly comparable.However, many significant taggers are not included (perhaps because of the labor involved inreconfiguring them for this particular dataset). Thus, it should not be assumed that the resultsreported there are the best that can be achieved with a given approach; nor even the best thathave been achieved with a given approach. IssuesWhile there is broad agreement about basic categories, a number of edge cases make it difficultto settle on a single "correct" set of tags, even in a single language such as English. For example,it is hard to say whether "fire" is functioning as an adjective or a noun in the big green fire truckA second important example is the use/mention distinction, as in the following example, where"blue" is clearly not functioning as an adjective (the Brown Corpus tag set appends the suffix "-NC" in such cases): the word "blue" has 4 letters.Words in a language other than that of the "main" text, are commonly tagged as "foreign",usually in addition to a tag for the role the foreign word is actually playing in context.There are also many cases where POS categories and "words" do not map one to one, forexample: Davids gonna dont vice versa first-cut cannot pre- and post-secondary look (a word) up
In the last example, "look" and "up" arguably function as a single verbal unit, despite thepossibility of other words coming between them. Some tag sets (such as Penn) break hyphenatedwords, contractions, and possessives into separate tokens, thus avoiding some but far from allsuch problems.It is unclear whether it is best to treat words such as "be", "have", and "do" as categories in theirown right (as in the Brown Corpus), or as simply verbs (as in the LOB Corpus and the PennTreebank). "be" has more forms than other English verbs, and occurs in quite differentgrammatical contexts, complicating the issue.The most popular "tag set" for POS tagging for American English is probably the Penn tag set,developed in the Penn Treebank project. It is largely similar to the earlier Brown Corpus andLOB Corpus tag sets, though much smaller. In Europe, tag sets from the Eagles Guidelines seewide use, and include versions for multiple languages.POS tagging work has been done in a variety of languages, and the set of POS tags used variesgreatly with language. Tags usually are designed to include overt morphological distinctions(this makes the tag sets for heavily inflected languages such as Greek and Latin very large; andmakes tagging words in agglutinative languages such an Inuit virtually impossible. However,Petrov, D. Das, and R. McDonald ("A Universal Part-of-Speech Tagset"http://arxiv.org/abs/1104.2086) have proposed a "universal" tag set, with 12 categories (forexample, no subtypes of nouns, verbs, punctuation, etc; no distinction of "to" as an infinitivemarker vs. preposition, etc). Whether a very small set of very broad tags, or a much larger set ofmore precise ones, is preferable, depends on the purpose at hand. Automatic tagging is easier onsmaller tag-sets. See also Semantic net Sliding window based part-of-speech tagging Trigram tagger Word sense disambiguation References Charniak, Eugene. 1997. "Statistical Techniques for Natural Language Parsing". AI Magazine 18(4):33–44. Hans van Halteren, Jakub Zavrel, Walter Daelemans. 2001. Improving Accuracy in NLP Through Combination of Machine Learning Systems. Computational Linguistics. 27(2): 199–229. PDF DeRose, Steven J. 1990. "Stochastic Methods for Resolution of Grammatical Category Ambiguity in Inflected and Uninflected Languages." Ph.D. Dissertation. Providence, RI: Brown University Department of Cognitive and Linguistic Sciences. DeRose, Steven J. 1988. "Grammatical category disambiguation by statistical optimization." Computational Linguistics 14(1): 31–39.