Chinese Grammar vs English Grammar in Universal Dependency


Published on

SIRE 2016

Published in: Technology
Chinese Grammar vs English Grammar in Universal Dependency

  1. 1. Chinese Grammar VS English Grammar in Universal Dependency Hang Jiang, Jinho D. Choi, PhD Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 30322 The aims of UD • Provide a concise generic set of features that are important to analyze different languages. • Annotate different corpus consistently across languages with some extensions on some specific languages. • This eventually makes parsing more accurate and easier Status of UD • 47 languages and dialects have had their own treebanks • Chinese as the most spoken language of the world is excluded. Project Goal In this project, we are going to to compare the difference between Chinese and English in UD in order to set up basic differences in building up Chinese treebank in UD. Motivation for UD Different from people’s intuition, English and Chinese have similar basic grammar that can be explained by Universal Grammar. English share similar structures with Chinese in: • Core dependents of clausal predicates (objects, subjects and complements but not clausal complement) • Root, coordination and loose joining Those dependents and some other similar dependents basically show UD can be applied to Chinese. A good example can be shown below. Fig.2 The graph representation shows that the Chinese and English sentences have highly corresponding dependency relation with each other in many cases.. • The dependency relation is amazingly similar in a word-to-word level for both Chinese and English sentences. English Structures fit Chinese English has many distinctive that features make us wonder whether English has brought some extra UD relations to UD that other languages may not need. English grammar show dramatic differences from that of Chinese mainly in (not limited to): • noun dependents (acl, det) • non-core dependents of clausal predicates (nmod, advmod, neg) • special clausal dependents (vocative, aux, mark, discourse, auxpass, expl) • case markers (case). Of expression is a good example. Fig.3 An alternative way of saying ‘the weather office won’ • In English, of expression’s corresponding structure doesn’t exist in Chinese. However, the regular noun modifiers are often followed by de (的), which is also a case relation. So there still exists case relation in Chinese. However, the following example is an exception found in our project. Fig.4 The expletive it in English doesn’t exist in Chinese. • The expletive it doesn't exist in Chinese at all. Instead Chinese has pro-dropping and assumes the subject is weather in this context. However, it is still indisputable that expl is necessary across languages. • As a result, UD relation is considered very concise and generic after comparing Chinese and English grammar. English UD Examples unfit Chinese Chinese has many different structural features compared with English. However, those features are mainly distributed in (not limited to): • noun classifiers • prepositions, postpositions • adjectives, comparatives • aspect marker • auxiliaries Below are two Chinese examples with clear dependency relation. 1. The first example here is about consecutive verb use in Chinese. Fig.5 Corresponding English to this example should be “He walks up (to somewhere).” • The phenomenon of the consecutive use of verbs in Chinese can actually be treated as asyndetic conjunction, which means the coordinating conjunction is omitted. Chinese Structures Missing in UD examples 2. The use of prepositions and postpositions in Chinese Fig. 6 The sentence means that “At school, I am always criticized.” • 在(at) and 里(inside) are respectively preposition and postposition in Chinese. Nevertheless, Ba sentence is the exception and we have to assign an ambiguous dep to it. See the example in Fig.7. Fig.7 English translationis that “It was I that let John finish and check homework for one time.” • In this SOV ba sentence, it is not possible to treat ba as a preposition and assign a case relation to ba and John(约翰) because every word can only have one head in dependency relation. As a result, the isolated ba has to be dep related to the verb following ba. Contributions • Show that UD is robust and basically compatible with Chinese • Find out that ba sentence as a counterexample that Chinese doesn’t fit UD relation • Provide clear relations, instead of dep, to Chinese distinctive structures in order to better adapt UD to Chinese compared with Stanford parser Future Work • Explore in more details how UD can be adapted to fit Chinese by adapting universal features and POS tags to Chinese morphology • Build up a comprehensive guideline for Chinese UD and then construct Chinese UD treebank. Contributions and Future Work Reference • Choi, Jinho D., and Martha Palmer. Guidelines for the Clear style constituent to dependency conversion. Technical Report 01-12, University of Colorado at Boulder, 2012. • De Marneffe, Marie-Catherine, and Christopher D. Manning. "The Stanford typed dependencies representation." Coling 2008: Proceedings of the workshop on Cross- Framework and Cross-Domain Parser Evaluation. Association for Computational Linguistics, 2008. • McDonald, Ryan T., et al. "Universal Dependency Annotation for Multilingual Parsing." ACL (2). 2013. Acknowledgement • This research was supported by Emory NLP in terms of its assistance with Emory NLP demo. See Reference & Acknowledgement English Spanish French Hindi Arabic Tokens # 254K 423K 389K 351K 282K Sentences # 16K 16K 16K 16K 7K Fig.1 The size of UD structures for some languages UD (Universal Dependency) is an annotation scheme for multilingual dependency structures, providing universal grammar. • Dependency relation is a linguistic relation discussing mainly the notions of subject, object, clausal complement, noun modifier, noun determiner and so on. • Therefore, UD has a set of syntactic rules to label relations of words by dependency relations. Introduction