Linguistic Essentials for NLP


Introducing some basic essentials from linguistics that might help when dealing with NLP tasks. This note is reconstruction of Chapter 3 from the book "Foundations of Statistical NLP" by Christopher D. Manning, containing some external sources like Coursera's NLP course held by Dragomir R. Radev in University of Michigan.

  1. 1. IDSLab Linguistic Essentials for NLP SNU IDS Lab. Jamie Seol
  2. 2. IDSLab Jamie Seol Quiz • A ( 1 ) is a category of words, which have similar syntactical or grammatical behavior. • ( 2 ) is the study of the regularities and constraints of word order and phrase structure. • ( 3 ) is the study of the meaning of words, constructions, and utterances. We can divide ( 3 ) into two parts, the study of the meaning of individual words and the study of how meanings of individual words are combined into the meaning of sentences. • (bonus point) list up 4 major types of phrases.
  3. 3. IDSLab Jamie Seol Language (언어) • Definition of language varies through perspective • invariant: it’s a system for communication • We can say so many many things about “What is a language?” • In here, we’re focuing on Natural Human Language from Linguistics which consists: • productivity • syntax • recursivity • displacement • modality independent
  4. 4. IDSLab Jamie Seol Language - Appendix • Natural human language is defined as: a system for complex communications using signs, gestures, sounds, symbols and etc. • complex is achieved by double articulation and syntax (note that drawings don’t have a syntax) • Properties from previous slide and above may appear even in non-human or non-linguistic languages like bee signs or baby cries, but natural human language is the only known one that has those in mutual • Natural human language has two major parts: phonological system and syntactic system; actually, treating those parts as separated concept is quite dangerous! • There are so many other properties in a language! It’s very, very sophisticated system we’re talking about
  5. 5. IDSLab Jamie Seol Language - Appendix • Examples of various languages • formal language: a set of strings and symbols constrained by finite (but possibly recursive) rules, having potential to construct complete and sound axiomatic systems • programming language: extenion of formal language that can determine a turing-complete systems • baby cries: typical non-linguistic communication systems, which is modality dependent, non-recursive, non-displacement • bee signs: typical non-human communication systems, modality dependent and non-recursive but has displacement! • surprisingly, bees can precisely tell the location of nectar sources even if it’s in somewhere out of sight
  6. 6. IDSLab Jamie Seol Language - Appendix • Classifying languages are very, very hard task • 3 major types of language families • dialect continua, isolates, proto-languages • In sense of morphological structure in typology, there are 4 types: • agglutinative: derivation occurs a lot • inflectional: inflection occurs a lot • isolating: like Chinese characters; requires alignment information to determine word’s meanings, neither do derivate nor inflect • polysynthetic: long long words like concatenated morphemes, acts as almost a sentence
  7. 7. IDSLab Jamie Seol Sentence (문장) • A sentence is a sequence of words that is complete in itself which can make a statement, question, command and etc. • compound of several clauses, and complete • Empirically, we do know that (for example, in English) letters → words →phrases → clauses → sentences →paragraphs → documents → … → languages → ? • But we can’t deal with some infinite concept! • Temporarily, we’ll only talk about things at most a sentence • semantics and pragmatics can cover cases with multiple sentences
  8. 8. IDSLab Jamie Seol Clause (절) • A clause is a sequence of words that has exactly one relationship of a subject and a predicate • “because she smiled at her” • this is a typical type of dependent clause • if a clause is complete in itself, then it can be a sentence and we call it an independent clause • “놀랍게도