• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Thesis about discourse
 

Thesis about discourse

on

  • 2,123 views

 

Statistics

Views

Total Views
2,123
Views on SlideShare
2,123
Embed Views
0

Actions

Likes
0
Downloads
81
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Thesis about discourse Thesis about discourse Document Transcript

    • DISCOURSE SEMANTICS OF S-MODIFYING ADVERBIALS Katherine M. Forbes A DISSERTATION in Linguistics Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2003 Bonnie Webber, Supervisor of Dissertation Ellen Prince, Supervisor of Dissertation Donald A. Ringe, Graduate Group Chair Aravind Joshi, Committee Member Robin Clark, Committee Member
    • Acknowledgements I wish to thank Bonnie Webber. Without her patience and her seemingly endless depths of insight, I might never have completed this thesis. I am enormously grateful for her guidance. I also owe many thanks to Ellen Prince. She is an intellectual leader at Penn who has helped many, including me, find a way through the jungle of discourse analysis. I am indebted to every professor who has taught me. Special thanks to Robin Clark for being a member of my dissertation committee. I am very lucky to have worked with Aravind Joshi. He is a continual source of knowledge in the DLTAG meetings. The field of computational linguistics has already benefited from his sentencelevel work; I fully expect he and Bonnie will produce similarly useful results with DLTAG. Also in DLTAG, Eleni Miltsakaki and Rashmi Prasad, and later Cassandre Creswell and Jason Teeple all provided stimulation and solace. Their great company and great effort on DLTAG projects taught me to appreciate how much can be done when minds work together. I look forward to the chance to work with them in the future. I am also thankful to Martha Palmer, Paul Kingsbury, and Scott Cotton for allowing me to work with them on the Propbank project and supplement both my income and my work in discourse. On a personal note, the Forbes, Finley, and Riley families deserve thanks for giving me love and diversion and balance and talking me through my education. Most of all, thanks to Enrico Riley, for being everything to me. ii
    • ABSTRACT DISCOURSE SEMANTICS OF S-MODIFYING ADVERBIALS Katherine M. Forbes Supervisors: Bonnie Webber and Ellen Prince In this thesis, we address the question of why certain S-modifying adverbials are only interpretable with respect to the discourse or spatio-temporal context, and not just their own matrix clause. It is not possible to list these adverbials because the set of adverbials is compositional and therefore infinite. Instead, we investigate the mechanisms underlying their interpretation. We present a corpusbased analysis of the predicate argument structure and interpretation of over 13,000 S-modifying adverbials. We use prior research on discourse deixis and clause-level predicates to study the semantics of the arguments of S-modifying adverbials and the syntactic constituents from which they can be derived. We show that many S-modifying adverbials contain semantic arguments that may not be syntactically overt, but whose interpretation nevertheless requires an abstract object from the discourse or spatio-temporal context. Prior work has investigated only a small subset of these discourse connectives; at the clause-level their semantics has been largely ignored and at the discourse level they are usually treated as “signals” of predefined lists of abstract discourse relations. Our investigation sheds light on the space of relations imparted by a much wider variety of adverbials. We further show how their predicate argument structure and interpretation can be formalized and incorporated into a rich intermediate model of discourse that alone among other models views discourse connectives as predicates whose syntax and semantics must be specified and recoverable to interpret discourse. It is not only due to their argument structure and interpretation that adverbials have been treated as discourse connectives, however. Our corpus contains adverbials whose semantics alone does not cause them to be interpreted with respect to abstract object interpretations in the discourse or spatio-temporal context. We explore other explanations for why these adverbials evoke discourse context for their interpretation; in particular, we show how the interaction of prosody with the interpretation of S-modifying adverbials can contribute to discourse coherence, and we also show how S-modifying adverbials can be used to convey implicatures. iii
    • Contents Acknowledgements ii Abstract iii Contents iv List of Tables x List of Figures xiv 1 Introduction 1 1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Anaphora and Discourse Models 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Descriptive Theories of Discourse Coherence . . . . . . . . . . . . . . . . . . . . 8 2.2.1 An Early Encompassing Description . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Alternative Descriptions of Propositional Relations . . . . . . . . . . . . . 10 2.2.3 Discourse Relations as Constraints . . . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Abducing Discourse Relations by Applying the Constraints . . . . . . . . 14 2.2.5 Interaction of Discourse Inference and VP Ellipsis . . . . . . . . . . . . . 17 iv
    • 2.2.6 19 Coherence within Discourse Segments . . . . . . . . . . . . . . . . . . . . 22 Modeling Linguistic Structure and Attentional State as a Tree . . . . . . . 23 Introduction to Discourse Deictic Reference . . . . . . . . . . . . . . . . . 25 2.3.5 Retrieving Antecedents of Discourse Deixis from the Tree . . . . . . . . . 27 2.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A Tree Structure with a Syntax-Semantic Interface . . . . . . . . . . . . . . . . . 33 2.4.1 Constituents and Tree Construction . . . . . . . . . . . . . . . . . . . . . 33 2.4.2 The Syntax Semantic Interface . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.3 Retrieving Antecedents of Anaphora from the Tree . . . . . . . . . . . . . 36 2.4.4 The Need For Upward Percolation . . . . . . . . . . . . . . . . . . . . . . 36 2.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 A Descriptive Theory of Discourse Structure . . . . . . . . . . . . . . . . . . . . . 38 2.5.1 Analyzing Text Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5.2 The Need for Multiple Levels of Discourse Structure . . . . . . . . . . . . 42 2.5.3 “Elaboration” as Reference . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A Semantic Theory of Discourse Coherence . . . . . . . . . . . . . . . . . . . . . 48 2.6.1 Abstract Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.6.2 A Formal Language for Discourse . . . . . . . . . . . . . . . . . . . . . . 53 2.6.3 Retrieving Antecedents of Anaphora from the Discourse Structure . . . . . 57 2.6.4 A System for Inferring Discourse Relations . . . . . . . . . . . . . . . . . 57 2.6.5 Extending the Theory to Cognitive States . . . . . . . . . . . . . . . . . . 62 2.6.6 2.7 The Three Tiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 2.6 19 2.3.3 2.5 A Three-Tiered Model of Discourse . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 2.4 18 2.3.1 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.7.1 64 Proliferation of Discourse Relations . . . . . . . . . . . . . . . . . . . . . v
    • 2.7.2 66 2.7.3 Structural and Anaphoric Cue Phrases . . . . . . . . . . . . . . . . . . . . 69 2.7.4 Comparison of DLTAG and Other Models . . . . . . . . . . . . . . . . . . 71 2.7.5 2.8 Use of Linguistic Cues as Signals . . . . . . . . . . . . . . . . . . . . . . Remaining Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3 Semantic Mechanisms in Adverbials 78 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2 Linguistic Background and Data Collection . . . . . . . . . . . . . . . . . . . . . 79 3.2.1 Function of Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.2.2 Structure of PP and ADVP . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.2.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Adverbial Modification Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.1 Clause-Level Analyses of Modification Type . . . . . . . . . . . . . . . . 87 3.3.2 Problems with Categorical Approaches . . . . . . . . . . . . . . . . . . . 91 3.3.3 Modification Types as Semantic Features . . . . . . . . . . . . . . . . . . 92 3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Adverbial Semantic Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.4.1 (Optional) Arguments or Adjuncts? . . . . . . . . . . . . . . . . . . . . . 94 3.4.2 External Argument Attachment Ambiguity . . . . . . . . . . . . . . . . . 98 3.4.3 Semantic Representation of External Argument . . . . . . . . . . . . . . . 103 3.4.4 Semantic Arguments as Abstract Objects . . . . . . . . . . . . . . . . . . 104 3.4.5 Number of Abstract Objects . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.3 3.4 3.5 S-Modifying PP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.5.1 Proper Nouns, Possessives, and Pronouns . . . . . . . . . . . . . . . . . . 111 3.5.2 Demonstrative and Definite Determiners . . . . . . . . . . . . . . . . . . . 114 3.5.3 Indefinite Articles, Generic and Plural Nouns, and Optional Arguments . . 117 vi
    • 3.5.4 3.5.5 Other Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.5.6 3.6 PP and ADJP Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 S-Modifying ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 3.6.1 3.6.2 Context-Dependent ADVP Adverbials . . . . . . . . . . . . . . . . . . . . 139 3.6.3 Comparative ADVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.6.4 Sets and Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 3.6.5 3.7 Syntactically Optional Arguments . . . . . . . . . . . . . . . . . . . . . . 135 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4 Incorporating Adverbial Semantics into DLTAG 157 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.2 Syntax-Semantic Interfaces at the Sentence Level . . . . . . . . . . . . . . . . . . 158 4.2.1 4.2.2 LTAG: Lexicalized Tree Adjoining Grammar . . . . . . . . . . . . . . . . 159 4.2.3 A Syntax-Semantic Interface for LTAG Derivation Trees . . . . . . . . . . 161 4.2.4 A Syntax-Semantic Interface for LTAG Elementary Trees . . . . . . . . . 166 4.2.5 Comparison of Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 169 4.2.6 4.3 The Role of the Syntax-Semantic Interface . . . . . . . . . . . . . . . . . 158 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Syntax-Semantic Interfaces at the Discourse Level . . . . . . . . . . . . . . . . . . 171 4.3.1 4.3.2 Syntax-Semantic Interfaces for Derived Trees . . . . . . . . . . . . . . . . 179 4.3.3 A Syntax-Semantic Interface for DLTAG Derivation Trees . . . . . . . . . 190 4.3.4 Comparison of Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 218 4.3.5 4.4 DLTAG: Lexicalized Tree Adjoining Grammar for Discourse . . . . . . . . 171 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 DLTAG Annotation Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 4.4.1 Overview of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 4.4.2 Preliminary Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 vii
    • 4.4.3 4.4.4 4.5 Preliminary Study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 5 Other Ways Adverbials Contribute to Discourse Coherence 229 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 5.2 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 5.2.1 5.2.2 Information-Structure and Theories of Structured Meanings . . . . . . . . 232 5.2.3 Alternative Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 5.2.4 Backgrounds or Alternatives? . . . . . . . . . . . . . . . . . . . . . . . . 237 5.2.5 Contrastive Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 5.2.6 5.3 The Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Focus Sensitivity of Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 5.3.1 5.3.2 Other Focus Sensitive Sub-Clausal Modifiers . . . . . . . . . . . . . . . . 244 5.3.3 S-Modifying “Focus Particles” . . . . . . . . . . . . . . . . . . . . . . . . 249 5.3.4 Focus Sensivity of S-Modifying Adverbials . . . . . . . . . . . . . . . . . 254 5.3.5 Focusing S-Modifying Adverbials to Evoke Context . . . . . . . . . . . . 258 5.3.6 5.4 Focus Particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Implicatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 5.4.1 5.4.2 Pragmatic and Semantic Presupposition . . . . . . . . . . . . . . . . . . . 267 5.4.3 5.5 Gricean Implicature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Using S-Modifying Adverbials to Convey Implicatures . . . . . . . . . . . . . . . 270 5.5.1 Presupposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 5.5.2 Conversational Implicatures . . . . . . . . . . . . . . . . . . . . . . . . . 271 5.5.3 Interaction of Focus and Implicature . . . . . . . . . . . . . . . . . . . . . 277 5.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 viii
    • 5.6 Other Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 5.6.1 5.6.2 5.7 Discourse Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Performatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 6 Conclusion 279 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Bibliography 285 ix
    • List of Tables 2.1 Main Categories of [HH76]’s Relations between Propositions . . . . . . . . . . . . 10 2.2 Main Categories of [Lon83]’s Relations between Propositions . . . . . . . . . . . 10 2.3 Main Categories of [Mar92]’s Relations between Propositions . . . . . . . . . . . 11 2.4 Main Categories of [Hob90]’s Relations between Propositions . . . . . . . . . . . 12 2.5 [Keh95]’s Cause-Effect Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6 [Keh95]’s Resemblance Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7 [GS86] Changes in Discourse Structure Indicated by Linguistic Expressions . . . . 21 2.8 Centering Theory Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.9 [Web91]’s Classification of Discourse Deictic Reference . . . . . . . . . . . . . . 26 2.10 Organizations of RST Relation Definitions . . . . . . . . . . . . . . . . . . . . . 39 2.11 Evidence: RST Relation Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.12 Volitional-Cause: RST Relation Definition . . . . . . . . . . . . . . . . . . . . . . 42 2.13 Elaboration: RST Relation Definition . . . . . . . . . . . . . . . . . . . . . . . . 44 2.14 [Ven67]’s Imperfect and Perfect Nominalizations . . . . . . . . . . . . . . . . . . 49 2.15 [Ven67]’s Loose and Narrow Containers . . . . . . . . . . . . . . . . . . . . . . . 50 2.16 DICE: discourse relation definitions . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.17 DICE: Indefeasible axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.18 DICE: Defeasible laws on world knowledge . . . . . . . . . . . . . . . . . . . . . 59 2.19 DICE: Defeasible laws on discourse processes . . . . . . . . . . . . . . . . . . . . 60 2.20 DICE: Deduction rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.21 [Kno96]’s Features of Discourse Connectives . . . . . . . . . . . . . . . . . . . . 68 x
    • 3.1 Non-Derived and Derived Adverbs . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.2 tgrep Results for S-Adjoined ADVP and PP in WSJ and Brown Corpora . . . . . . 85 3.3 Total S-Adjoined Adverbials in WSJ and Brown Corpora . . . . . . . . . . . . . . 86 3.4 [Ale97]’s Modification Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.5 [Ern84]’s Modification Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.6 [KP02]’s Modification Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.7 [Gre69]’s Syntactic Tests for Distinguishing VP and S Modification . . . . . . . . 99 3.8 Semantic Interpretations of [Ern84]’s Modification Types . . . . . . . . . . . . . . 103 3.9 Abstract Object Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.10 Approximate Counts of Tokens and Types of some Internal PP arguments . . . . . 111 3.11 PP Adverbials with Proper Noun or Year Internal Argument . . . . . . . . . . . . 111 3.12 PP Adverbial with Possessive Proper Noun Internal Argument . . . . . . . . . . . 112 3.13 PP Adverbials with Pronominal Internal Argument . . . . . . . . . . . . . . . . . 113 3.14 PP Adverbial with Possessive Pronoun . . . . . . . . . . . . . . . . . . . . . . . . 114 3.15 Approximate Counts of Tokens and Types of some Internal PP arguments . . . . . 114 3.16 PP Adverbials with Definite Concrete Object Internal Argument . . . . . . . . . . 115 3.17 PP Adverbials with Definite AO Internal Argument . . . . . . . . . . . . . . . . . 115 3.18 PP Adverbials with Demonstrative Concrete Object Internal Argument . . . . . . . 116 3.19 PP Adverbials with Demonstrative AO Internal Arguments . . . . . . . . . . . . . 117 3.20 Approximate Counts of Tokens and Types of some Internal PP arguments . . . . . 118 3.21 PP Adverbial with Indefinite Concrete Object Internal Argument . . . . . . . . . . 118 3.22 PP Adverbial with Indefinite AO Internal Argument . . . . . . . . . . . . . . . . . 118 3.23 PP Adverbial with Relational Indefinite AO Internal Argument . . . . . . . . . . . 119 3.24 PP Adverbials with Generic or Plural Concrete Object Internal Arguments . . . . . 122 3.25 PP Adverbials with Generic or Plural AO Internal Arguments . . . . . . . . . . . . 122 3.26 PP Adverbials with Relational Generic AO Internal Arguments . . . . . . . . . . . 123 3.27 Approximate Counts of Tokens and Types of some Internal Argument Modifiers . . 124 3.28 Binary Definite Internal Argument with Overt Argument . . . . . . . . . . . . . . 125 xi
    • 3.29 Binary Indefinite Internal Argument with Overt Argument . . . . . . . . . . . . . 125 3.30 Binary Generic or Plural Internal Argument with Overt Argument . . . . . . . . . 126 3.31 Internal Argument with a Spatio-Temporal ADJ . . . . . . . . . . . . . . . . . . . 126 3.32 Internal Argument with Referential Adjective . . . . . . . . . . . . . . . . . . . . 127 3.33 Internal Argument with Non-Referential Adjective . . . . . . . . . . . . . . . . . 128 3.34 Internal Argument with Determiner and Non-Referential Adjective . . . . . . . . . 128 3.35 Internal Argument with Ordinal Adjective . . . . . . . . . . . . . . . . . . . . . . 129 3.36 Internal Argument with Alternative Phrase . . . . . . . . . . . . . . . . . . . . . . 129 3.37 Internal Argument with Determiner and Alternative Phrase . . . . . . . . . . . . . 129 3.38 Internal Argument with Comparative/Superlative Adjective . . . . . . . . . . . . . 130 3.39 Internal Argument with Other Set-Evoking Adjectives . . . . . . . . . . . . . . . . 131 3.40 Approximate Counts of Tokens and Types of some Internal PP arguments . . . . . 131 3.41 PP Adverbial with Reduced Clause Internal Argument . . . . . . . . . . . . . . . 132 3.42 PP Adverbial Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 3.43 Approximate Counts of Tokens and Types of some ADVP Adverbials . . . . . . . 135 3.44 Mis-Tagged PP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 3.45 PP-like ADVP Adverbials with Overt Arguments . . . . . . . . . . . . . . . . . . 136 3.46 PP-like ADVP Adverbials with Hidden Argument . . . . . . . . . . . . . . . . . . 136 3.47 Relational ADJP with Overt Argument . . . . . . . . . . . . . . . . . . . . . . . . 137 3.48 Relational ADVP Adverbials with Hidden Argument . . . . . . . . . . . . . . . . 138 3.49 Approximate Counts of Tokens and Types of some ADVP Adverbials . . . . . . . 139 3.50 ADVP Adverbial Conjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 3.51 Mis-Tagged PP Adverbial Constructions . . . . . . . . . . . . . . . . . . . . . . . 142 3.52 Spatio-Temporal ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . 143 3.53 Another Spatio-Temporal ADVP Adverbial . . . . . . . . . . . . . . . . . . . . . 143 3.54 Other Spatio-Temporal ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . 143 3.55 Spatio-Temporal Manner ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . 144 3.56 Deictic ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 xii
    • 3.57 Deictic-Derived ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . 145 3.58 Approximate Counts of Tokens and Types of some ADVP Adverbials . . . . . . . 146 3.59 Comparative Adverb Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.60 Comparative ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 3.61 Specified Comparative ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . 147 3.62 Comparative-Derived ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . 148 3.63 Comparative Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 3.64 Approximate Counts of Tokens and Types of some ADVP Adverbials . . . . . . . 150 3.65 Ordinal ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 3.66 Ordinal -ly ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 3.67 Frequency ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 3.68 Epistemic ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 3.69 Domain ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 3.70 Non-Specific Set-Evoking ADVP Adverbials . . . . . . . . . . . . . . . . . . . . 153 3.71 Multiply-Featured ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . . . . . 153 3.72 More Multiply-Featured ADVP Adverbials . . . . . . . . . . . . . . . . . . . . . 153 3.73 Evaluative or Agent-Oriented ADVP Adverbials . . . . . . . . . . . . . . . . . . . 154 3.74 ADVP Adverbial Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155   4.1 Nine Connectives Studied in [CFM 02] . . . . . . . . . . . . . . . . . . . . . . . 223 4.2 Annotation Tags for the Nine Connectives Studied in [CFM 02] . . . . . . . . . . 224 4.3 LOC Tag Values 4.4 Inter-Annotator Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 5.1 ADVP/PP Adverbials with Focus Particle Modifier . . . . . . . . . . . . . . . . . 260 5.2 Higher-Ordered Epistemic Adverbials Yielding Implicatures . . . . . . . . . . . . 271 5.3 Lower-Ordered Epistemic Adverbials Yielding Implicatures . . . . . . . . . . . . 273 5.4 Lower-Ordered Quantificational Adverbials Yielding Implicatures . . . . . . . . . 274 5.5 Higher-Ordered Quantificational Adverbials Yielding Implicatures . . . . . . . . . 274   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 xiii
    • List of Figures 2.1 [HH76]’s Types of Cohesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Illustration of [GS86]’s Discourse Model . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Illustration of [Web91]’s Attachment Operation . . . . . . . . . . . . . . . . . . . 24 2.4 Illustration of [Web91]’s Adjunction Operation . . . . . . . . . . . . . . . . . . . 24 2.5 LDM Right-Attachment Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 LDM Tree Structure for Example (2.58) . . . . . . . . . . . . . . . . . . . . . . . 36 2.7 LDM Tree Structure for Example (2.59) . . . . . . . . . . . . . . . . . . . . . . . 37 2.8 RST Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.9 Evidence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.10 RST Condition and Motivation Relations . . . . . . . . . . . . . . . . . . . . . . 43 2.11 [KOOM01]’s Discourse Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.12 [Ash93]’s Classification of Abstract Objects . . . . . . . . . . . . . . . . . . . . . 52 2.13 Sample DRSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.14 Sample SDRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.15 Elementary DLTAG Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.1 S-Adjoining PP and ADVP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.2 S-Adjoined Discourse and Clausal Adverbials . . . . . . . . . . . . . . . . . . . . 84 3.3 S-Adjoined ADVP and PP Adverbials in Penn Treebank I . . . . . . . . . . . . . . 84 3.4 [Ash93]’s Classification of Abstract Objects . . . . . . . . . . . . . . . . . . . . . 105 3.5 Syntactic Structure of S-Modifying PP Adverbials . . . . . . . . . . . . . . . . . . 110 xiv
    • 3.6 Syntactic Structure of S-Modifying ADVP Adverbials . . . . . . . . . . . . . . . . 134 4.1 Elementary LTAG Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 4.2 LTAG Derived Tree after Substitions . . . . . . . . . . . . . . . . . . . . . . . . . 160 4.3 LTAG Derived Tree After Adjunction . . . . . . . . . . . . . . . . . . . . . . . . 161 4.4 LTAG Derivation Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.5 Semantic Representations of 4.6 Semantic Representations of John walks . . . . . . . . . . . . . . . . . . . . . . . 163 4.7 Semantic Representations of 4.8 Semantic Representations of John often walks Fido . . . . . . . . . . . . . . . . . 164 4.9 Simplified Semantic Representation of )  &% #    ('¥$¡ "! ¡ ©¨§¥£¡  ¦¤ ¢ and , and 32 0 94"! , )  ¦¤ ¢ ©¨§¥£¡ , . . . . . . . . . . . . . . . . . . . 163 32 0 541  ¡ and . . . . . . . . . . . 164 . . . . . . . . . . . . . . 165  ¦¤ ¢ "8¨76$¡ 4.10 The Elementary Tree for slide . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 F ECA G9DB@ 4.11 The Syntax-Semantic Interface for . . . . . . . . . . . . . . . . . . . . . . . 168 4.12 DLTAG Initial Trees for Subordinating Conjunctions . . . . . . . . . . . . . . . . 172 H 4.13 DLTAG Auxiliary Tree for and and . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.14 DLTAG Auxiliary Trees for Discourse Adverbials . . . . . . . . . . . . . . . . . . 174 4.15 DLTAG Initial Tree for Adverbial Constructions . . . . . . . . . . . . . . . . . . . 177 4.16 DLTAG Derived Tree for Example (4.18) . . . . . . . . . . . . . . . . . . . . . . 178 4.17 DLTAG Derivation Tree for Example (4.18) . . . . . . . . . . . . . . . . . . . . . 178 4.18 Illustration of [Web91]’s Attachment and Adjunction Operations . . . . . . . . . . 179 4.19 Webber’s Adjunction at a Leaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.20 Derived Tree for Example (2.41) . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 4.21 Substitution in FTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4.22 Adjunction in FTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4.23 LDM Elementary DCU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.24 DTAG Elementary DCU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.25 LDM List Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 4.26 DTAG R Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 4.27 [Gar97b]’s -Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 E xv
    • 4.28 [Gar97b]’s -Adjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 E 4.29 First DTAG Derivation of Example (4.21) . . . . . . . . . . . . . . . . . . . . . . 186 4.30 Second DTAG Derivation of Example (4.21) . . . . . . . . . . . . . . . . . . . . 186 4.31 Step One in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . . 187 4.32 Step Two in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . . 187 4.33 Step Three in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . 188 4.34 Step Four in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . 188 4.35 Step Five in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . . 189 4.36 Step Six in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . . 189 4.37 Step Seven in the Second DTAG Derivation of Example (4.21) . . . . . . . . . . . 190 4.38 DLTAG Elementary Trees for Example (4.22) . . . . . . . . . . . . . . . . . . . . 191 , and  ¦¤ ¢ "8¨§¥$¡ 4.39 Semantic Representation of . . . . . . . . . . . . . . . 192 3 ¨ ¦ 3 S¤Q3I !T8%$¡ 9(RP$¡ 4.40 DLTAG Derived and Derivation Trees and Semantic Representation for (4.22) . . . 192 4.41 DLTAG Elementary Trees for Example (4.24) . . . . . . . . . . . . . . . . . . . . 193 , and % ¨ 54V$¡ & 33 9 ¡ U(¤ ) 4.42 Semantic Representation of . . . . . . . . . . . . . . . . . . 193 4.43 DLTAG Derived and Derivation Trees and Semantic Representation for (4.24) . . . 194 4.44 DLTAG Derived and Derivation Trees and Semantic Representation for (4.26) . . . 194 4.45 DLTAG Derived and Derivation Trees and Semantic Representation for (4.28) . . . 195 4.46 LTAG Derived and Derivation Trees for Example (4.30) . . . . . . . . . . . . . . . 196 4.47 Quantifiers in French . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 4.48 [Kal02]’s -Edges for Quantifiers in French . . . . . . . . . . . . . . . . . . . . . 197 F 4.49 [Kal02]’s -Derivation Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 F 4.50 DLTAG Derived Tree and -Derivation Graph for Example (4.28) . . . . . . . . . 199 F 4.51 Additional Semantic Representation for (4.28) due to -Derivation Graph . . . . . 199 F 4.52 DLTAG Derived Tree and -Derivation Graph for Example (4.32) . . . . . . . . . 200 F 4.53 Flexible Composition in LTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 4.54 DLTAG Derived and Derivation Trees and Semantic Representation for (4.28) . . . 203 4.55 DLTAG Elementary Trees for Example (4.33) . . . . . . . . . . . . . . . . . . . . 204 xvi
    • , X Y) ) , , and ¢3% a `32  5'(!94T% ¡ %  W¤ §6¢ ¡ '42 , . . . . . 204 `3 &%  Q 5('RbR$¡ )  T% 4.56 Semantic Representation of 4.57 DLTAG Derived and Derivation Trees, -Derivation Graph and Semantics for (4.33) 205 F 4.58 DLTAG Elementary Trees for Example (4.35) . . . . . . . . . . . . . . . . . . . . 206 , & % T0 ¡ X e) , and 3 ¦ 918f¡ ) dW49B55URQ ¦2 3 Sc3  4.59 Semantic Representation of . . . . . . . . . . . . 207 4.60 DLTAG Derived and Derivation Trees, -Derivation Graph, and Semantics for (4.35) 207 F 4.61 DLTAG Elementary Trees for Example (4.37) . . . . . . . . . . . . . . . . . . . . 208 , , X Y) , , & % T0 ¡ 2 ¤ 2 0  h ¦ S 3 ` §4£¡ 1Yi82gR!1£¡ ¤ ) , and 3 ¦ 5!©$¡ )  7¤ 4.62 Semantic Representation of . . . . . 208 4.63 DLTAG Derived and Derivation Trees, -Derivation Graph and Semantics for (4.37) 209 F d2g91$¡ ¦ S3` 4.64 DLTAG Elementary Tree and Semantic Representation for in (4.39) . . . . 210 4.65 DLTAG Derived and Derivation Trees, -Derivation Graph and Semantics for (4.39) 210 F 4.66 DLTAG Derivation Tree and -Derivation Graph for Example (4.41) . . . . . . . . 212 , , ) , ¨`¤ 1§PI p& W`3a ($¡ !9§53 ¡ !9§53 W`3a ) F 4.67 Elementary LTAG Trees and Semantic Representations of 214 4.68 Elementary DLTAG Trees for Example for example . . . . . . . . . . . . . . . . . 215 4.69 Derivation Trees for PP Discourse Adverbials with Quantified Internal Arguments . 216 4.70 DLTAG Derived and Derivation Trees for (4.32) . . . . . . . . . . . . . . . . . . . 218 4.71 Another Representation of the R Tree in Figure 4.26 5.1 . . . . . . . . . . . . . . . . 219 Gricean Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 xvii
    • Chapter 1 Introduction 1.1 The Problem Traditionally in linguistic theory, syntax and semantics provide mechanisms to build the interpretation of a sentence from its parts; although it is non-controversial a sequence of sentences such as found in (1.1) - (1.3) also has an interpretation, the mechanisms which produce it are not defined. (1.1) There is a high degree of stress level from the need to compete and succeed in this ‘me generation’. As a result, people have become more self-centered over time. (1.2) John has finally been rewarded for his great talent. Specifically, he just won a gold medal for mogul-skiing in the Olympics. (1.3) The company interviewed everyone who applied for the position. In this way, they considered all their options. Most discourse theories go beyond sentence level linguistic theory to explain how such sequences are put together to create a discourse interpretation. These theories evoke the notion of abstract discourse relations between discourse units, provide lists of these relations of varying length and organization, and propose discourse models constructed from these relations and units. Some of these models produce compositional accounts of discourse structure and/or interpretation ([Pol96, Ash93, MT88, GS86]; others produce accounts for how relations between units are inferred([Keh95, HSAM93, LA93]. The majority make use of the presence of cue phrases, or 1
    • discourse connectives, treating them as “signals” of the presence of particular discourse relations. In (1.1), for example, the relevant cue phrase is the adverbial as a result, and the discourse relation it signals is frequently classified as a result relation. Along with certain adverbials, the subordinating and coordinating conjunctions are also classified as discourse connectives in these theories.     DLTAG [FMP 01, CFM 02, WJSK03, WKJ99, WJSK99, WJ98] is a theory that bridges the gap between clause-level and discourse-level theories, providing a model of a rich intermediate level between clause structure and high-level discourse structure, namely, the syntax and semantics associated with discourse connectives. In DLTAG, discourse connectives are predicates, akin to verbs at the clause level, except that they take discourse units as arguments. DLTAG proposes to build the interpretation of these predicates directly on top of the clause, using the same syntactic and semantic mechanisms that are already used to build the clause. Based on considerations of computational economy and behavioral evidence, DLTAG argues that both arguments of subordinating and coordinating conjunctions can be represented structurally, but only one argument of adverbial discourse connectives comes structurally; the other argument must be resolved anaphorically. However, while DLTAG has shown that certain adverbials function as discourse connectives, it has not isolated the subset of adverbials which function as discourse connectives from the set of all adverbials. The set of all adverbials is a large set; in fact, it is compositional and therefore infinite[Kno96]. Because it is thus not possible to list all of the adverbials that function as discourse connectives, in this thesis we investigate how semantics and pragmatics cause an adverbial to function as a discourse connective. 1.2 Contributions of the Thesis This thesis extends the DLTAG model, investigating the semantics and pragmatics underlying the behavioral anaphoricity of adverbial discourse connectives. We present a corpus-based analysis of over 13,000 S-modifying adverb (ADVP) and preposition (PP) adverbials in the Penn Treebank Corpus [PT]. We show that certain adverbials, which we call discourse adverbials, can be distinguished semantically from other adverbials, which we call clausal adverbials. Some clausal adverbials from our corpus are shown in (1.4), and some discourse adverbials from our corpus are shown in (1.5). 2
    • (1.4) Probably/In my city/In truth, women take care of the household finances. (1.5) As a result/Specifically/In this way, women take care of the household finances. The most frequently occurring clausal and discourse adverbials have both been classified in the literature as discourse connectives, due to the fact that they seem to be interpretable only with respect to context. In this thesis we will show that while syntax cannot distinguish these two types of adverbials, their predicate argument structure and interpretation shows that only discourse adverbials function semantically as discourse connectives. The syntax and semantics of most discourse adverbials has not been well studied. Generally, only a small subset (those that occur frequently) have been addressed at all. At the clause level these are usually designated as the domain of discourse level research, and at the discourse level the focus is frequently on the discourse relation they “signal”. Our investigation sheds light on the space of relations imparted by a much wider variety of adverbials. In our analysis we draw on clause-level research into the semantics of adverbials and other sub-clausal constituents. We use prior research on discourse deixis to study both the semantic nature of the arguments of adverbials and the syntactic constituents from which they can be derived. We present a wide variety of discourse and clausal adverbials. We show that discourse adverbials function semantically as discourse connectives because they contain semantic arguments that may or may not be syntactically overt, but whose interpretation requires an abstract object interpretation of a contextual constituent. We show that clausal adverbials do not function semantically as discourse connectives because the interpretations of their semantic arguments do not require the abstract object interpretation of a contextual constituent, although they may make anaphoric reference to other contextual interpretations. We further show how the predicate argument structure and interpretation of discourse adverbials can be formalized and incorporated into the syntax of the DLTAG model. It is not only due to their predicate argument structure and interpretation that adverbials have been classified as discourse connectives, however. We encounter in our corpus a number of adverbials that have been treated as discourse connectives despite the fact that their semantics does not require abstract object interpretations in the discourse or spatio-temporal context. We explore other explanations for how these adverbials evoke discourse context during their interpretation; in partic- 3
    • ular, we investigate the interaction of their semantics with other semantic and pragmatic devices. We show how focus effects in S-modifying adverbials contribute to discourse coherence, and we also show how S-modifying adverbials can be used to convey Gricean implicatures. While the semantics and pragmatics discussed here will not provide a complete account of the discourse functions of all adverbials, it will show that the analysis of adverbials can be viewed modularly: certain functions can be attributed to the semantic domain, others to the pragmatic domain, and still others to larger issues of discourse structure. There are numerous benefits of this analysis. First, it is economical, making use of pre-existing clause-level mechanisms to build adverbial semantics at the discourse level, thereby reducing the load on inference to account for discourse interpretation (c.f. [Keh95]). Secondly, it provides a theoretical grounding for [Kno96]’s empirical approach to studying the lexical semantics of discourse connectives, in the process showing that additional adverbials should be included in the class he isolates based on intuition alone, and that some of those included there don’t really belong. Thirdly, it expands an existing model of discourse which argues that discourse structure can be built directly on top of clause structure and thereby bridges the gap between high-level discourse theory and clause-level theory. 1.3 Thesis Outline In this chapter, we have given a brief overview of the analyses that we present in the remainder of this thesis. The rest of this thesis is organized as follows: In Chapter 2 we survey a variety of existing discourse theories and examine the similarities and differences between each theory. We discuss how, taken together, each theory serves to distinguish different modules required to build a complete interpretation of discourse. We then overview DLTAG as another important module that alone out of all the others is capable of bridging the gap between discourse level theories and clause level theories, by treating discourse connectives as predicates and using the same syntax and semantics that builds the clause to build an intermediate level of discourse. In Chapter 3 we investigate the semantic mechanisms that cause some adverbials to function 4
    • as discourse connectives. We discuss prior research into the semantics of adverbials and present an analysis of the S-modifying adverbials in the Penn Treebank corpus that distinguishes those adverbials that function as discourse connectives according to their predicate-argument structure and interpretation. In Chapter 4 we show how the semantics of adverbials discussed in Chapter 3 can be incorporated into a syntax semantic interface for DLTAG. We discuss syntax-semantic interfaces that have been proposed for clause-level grammars and related discourse grammars and show how these interfaces can be extended to DLTAG. We further discuss the DLTAG annotation project whose goal is to annotate the arguments of all discourse connectives, both structural and anaphoric. In Chapter 5 we continue our analysis of how adverbials function as discourse connectives. investigating other ways apart from their predicate argument structure and argument resolution in which an adverbial can be used to contribute to discourse coherence. We conclude in Chapter 6 and discuss directions for future work. 5
    • Chapter 2 Anaphora and Discourse Models 2.1 Introduction Discourse models explain how sequences of utterances are put together to create a text. Building a coherent discourse involves more than just concatenating random utterances; in addition, the contributions of each utterance to the surrounding context must be established. Two major areas of investigation have been distinguished. The first concerns how sub-clausal constituents obtain their meaning through relationships to entities previously evoked in a discourse. Such constituents include NPs, such as in (2.1) where the personal pronoun he refers to an entity mentioned in the prior sentence, (2.2) where the beer refers to one of the elements of the picnic in the prior sentence, and (2.3) where the demonstrative pronoun that refers to the interpretation of the prior sentence. (2.1) Bill talked to Phillip. He got really upset. (2.2) Bill and Mary took a picnic to the park. The beer was warm. (2.3) Bill talked to Phillip. That made me mad. Other examples include VPs, such as in (2.4) where the elided VP ( ) must be determined from q the meaning of the prior sentence, and in (2.5) where the use of simple past tense in both sentences creates an impression of forward progression in time. q (2.4) Bill talked to Phillip. I did too. (2.5) Bill entered the room. He began to talk. 6
    • The second major area of investigation concerns how clausal (and super-clausal) constituents obtain their meaning through relationships to clausal constituents in the surrounding context. To illustrate the nature of these investigations, consider the discourse in (2.6). (2.6 a) Last summer, the Keatings traveled in Zimbabwe. (2.6 b) Pat studied flora in the Chimanimani mountains. In the absence of any additional context, one reader might interpret (2.6), and/or the writer’s intention in producing (2.6), as a description of what the Keatings on the one hand, and Pat on the other, did the prior summer. Another reader might interpret it as contrasting what the two participants did the prior summer, e.g. the Keatings (just) traveled, whereas Pat studied. Interactions between these two areas of investigation have also been studied. For example, suppose (2.6) is preceded and followed by other sentences, as in (2.7). (2.7 a) Pat Keating married Maria Lopez last spring. (2.7 b) Last summer, the Keatings traveled in Zimbabwe. (2.7 c) Pat studied flora in the Chimanimani mountains. (2.7 d) That was a spectacular celebration. Due to addition of (a), the reader will likely determine that Pat is a member of the Keating family. S/he might thus interpret Pat’s studying as an elaboration of, or even as a cause of, the Keatings’ traveling, or s/he might simply interpret Pat’s studying as occurring after the Keatings’ traveling. World knowledge or inference may yield the belief that the Chimanimani mountains are located in Zimbabwe. Note that the demonstrative reference in (d) is hard to resolve to the marriage described in (a) unless we move it to a position immediately following (a) in the discourse. A complete model of discourse must account for all of these relationships, and their interactions. In particular, a discourse model must characterize: r the properties of the constituents that are being related r the type of relationships that can exist between these constituents r the mechanisms underlying these relationships 7
    • r the constraints on the application of these mechanisms In the following sections we will survey a variety of existing discourse models in terms of their coverage of the above characterizations. By taking a roughly chronological approach, and examining the benefits and limitations of each subsequent model in terms of how it incorporates those prior to it, these characterizations will be fleshed out, and it will be shown that, taken together, each theory serves to distinguish different modules required to build a complete interpretation of discourse. We then introduce DLTAG as an important module capable of bridging the gap between discourse level theories and clause level theories. 2.2 2.2.1 Descriptive Theories of Discourse Coherence An Early Encompassing Description [HH76] early proposed that a single underlying factor, which they call cohesion, unifies sequences of sentences to create a discourse. Cohesion is defined as the “semantic relations between successive linguistic devices in a text, whereby the interpretation of one presupposes the interpretation of the other in the sense that it cannot be effectively decoded except by recourse to it”([HH76] p.4)1 . They distinguish five classes of cohesion, shown in Figure 2.1. Figure 2.1: [HH76]’s Types of Cohesion Reference is a semantic relation achieved by the use of a cataphoric or anaphoric reference item to signal that the appropriate instantial meaning be supplied. Personal reference (signaled by per1 This use of the term “presupposition” is not equivalent to semantic presupposition; the latter depends on truth valuation and the former does not. Both [HH76] and [Sil76] define “discourse”, or “pragmatic”, presupposition as the relationship of a linguistic form to its prior context; Silverstein adds that a pragmatic presupposition is what a language user must know about the context of use of a linguistic signal in order to interpret it [Sil76, 1]. See Chapter 5 for further discussion of presupposition. 8
    • sonal pronouns and determiners, e.g. I, my), demonstrative reference (signaled by demonstratives, e.g. this, that), and comparative reference (signaled by certain nominal modifiers, e.g. same, and verbal adjuncts, e.g. identically) are distinguished, and exemplified in italics in (2.8). (2.8) John saw a black cat, but that doesn’t mean it was the same black cat he saw before. Lexical cohesion is a semantic relation achieved by the successive use of vocabulary items referring to the same entity or event, including definite descriptions, repetitions, synonyms, superordinates, general nouns, and collocation. Every lexical item can be lexically cohesive; this function is established by reference to the text. In [HH76]’s example, shown in (2.9), there are definite descriptions: a pie...the pie, repetitions: pie...pie, general nouns and synonyms: a pie...a dainty dish and super-ordinates: blackbirds...birds. Sing a song of sixpence, a pocket full of rye, (2.9) Four-and-twenty blackbirds baked in a pie, When the pie was opened, the bird began to sing, Wasn’t that a dainty dish to set before a king? Substitution and Ellipsis are grammatical relations, which can be nominal, verbal, or clausal. The substitute must be of the same grammatical class as the item for which it substitutes, and ellipsis is substitution by zero([HH76, 89]). In (2.10), nominal one is a substitute, and there is ellipsis of the embedded predicate in the final clause. (2.10) Mary covets two things. Her money will be the first one to leave her. Her husband will be the next 0. Conjunction is a semantic relation usually achieved by the use of conjunctive elements, whose meaning presupposes the presence of other propositions in the discourse and specifies the way they connect to the proposition that follows. Italicized examples are shown in (2.11). [HH76] distinguish four main types of relations between propositions, shown in Table 2.1. These relations are further subdivided, and an orthogonal distinction is made between external and internal relations; the former hold between elements in the world (referred to in the text), and the latter between text elements themselves, such as speech acts. 9
    • (2.11) Because it snowed heavily, the battle was not fought, so the soldiers went home. Table 2.1: Main Categories of [HH76]’s Relations between Propositions ADDITIVE complex apposition comparison 2.2.2 ADVERSATIVE contrastive correction dismissal CAUSAL specific conditional respective TEMPORAL sequential simultaneous conclusive correlative Alternative Descriptions of Propositional Relations In comparison to [HH76], [Lon83]’s study of discourse coherence distinguishes between predications expressed by clauses, which he models with predicate calculus, and relations on the predications expressed by clauses, which he characterizes into two main types, shown in Table 2.2: the “basic” operations of propositional calculus, supplemented by temporal relations, and a set of elaborative relations. These relations are further subdivided, and an orthogonal distinction is made between non-frustrated and frustrated relations, the latter being the case when an expected relation is not satisfied by the assertions in the text. Unlike [HH76], [Lon83] does not emphasize a correlation between these relations and surface signals in the text; rather, they are meant to categorize the “deep” relations underlying the surface structure of discourse. Table 2.2: Main Categories of [Lon83]’s Relations between Propositions BASIC conjoining ( ) alternation ( ) implication ( ) temporal ELABORATIVE paraphrase illustration deixis attribution u s t More recently, [Mar92] has proposed an alternative set of relations between propositions, shown in Table 2.3, in which four main types are distinguished. These relations are further subdivided, and orthogonal distinctions are made between internal and external relations, and paratactic, hypotactic, and neutral relations. The first dimension is taken from [HH76], and the latter dimension roughly 10
    • corresponds to coordinating, subordinating, and variably coordinating and subordinating relations, respectively. Like [HH76], Martin uses explicit signals to derive his set of discourse relations, but like [Lon83], he defends the claim that they represent “deep” relations underlying the surface structure. He combines the two approaches by using an insertion test: a “deep” relation exists at a place in the text if an explicit signal can be inserted there. Nevertheless, his set is different from both [HH76] and [Lon83]. Table 2.3: Main Categories of [Mar92]’s Relations between Propositions ADDITIVE addition alternation COMPARATIVE similarity contrast TEMPORAL simultaneous successive CONSEQUENTIAL purpose concession condition manner consequence [SSN93] take a psychological approach, identifying the basic cognitive resources underlying the production of discourse relations. Four cognitive primitives are identified, according to which discourse relations can be classified, which they exemplify using explicit cue phrases. [SSN93] cite a number of psychological experiments to support these features. r basic operation: Each relation creates either an additive (and) or a causal (because) connection between the related constituents. r source of coherence: Each relation creates either semantic or pragmatic coherence; in the first case the propositional content of the constituents is related, in the second case the illocutionary force of the constituents is related. r order of segments: Causal relations may have the causing segment to the left or the right of the result. polarity: A relation is negative if it links the content of one segment to the negation of the r content of the other segment (although), and positive otherwise. [Hob90] takes a computational approach, identifying relations between propositions according to the kind of inference that is required to identify them. The main categories are shown in Table 2.4. 11
    • Respectively, these categories distinguish inference about causality between events in the world, inference about the speaker’s goals, inference about what the hearer already knows, and inference that a hearer is expected to be able to make about relationships between objects and predicates in the world. [Hob90] suggests that inference should be viewed as a recursive mechanism; when two propositions are linked by a relation, they form a unit which itself can be related to other units, thereby building an interpretation of the discourse as a whole. Table 2.4: Main Categories of [Hob90]’s Relations between Propositions Occasion cause enablement 2.2.3 Evaluation Ground-Figure background explanation Explanation parallel generalization exemplification contrast Discourse Relations as Constraints [Keh95] reformulates [Hob90]’s relations between propositions into three main types of more general “discourse relations” : Contiguity, Cause-Effect, and Resemblance, which he defines in terms of constraints on both clausal and sub-clausal properties of discourse units S and S . He then spec- w v ifies how an inference mechanism can be used to derive Cause-Effect and Resemblance relations, and shows how they interact with sub-clausal coherence. Like [HH76] and [Mar92], he correlates these relations with the presence of cue phrases, suggesting that they could be treated as bearing semantic features that interact with the discourse inference process. Narration is the only Contiguity relation Kehler defines. Exemplified in (2.12), the constraint on its derivation is that a change of state for a system of entities from S be inferred, where the w initial state for this system is inferred from S . v (2.12) Bill picked up the speech. He began to read. Kehler notes that the full set of constraints governing the recognition of a Narration relation are not well understood, but he refutes ([HH76, Lon83])’s treatments, which equate it with temporal progression, citing [Hob90]’s example (2.13), whose interpretation requires the additional inference that Bush is on the train, or that the train arrival is somehow relevant to him. 12
    • (2.13) At 5:00 a train arrived in Chicago. At 6:00 George Bush held a press conference. Kehler distinguishes four types of Cause-Effect relations, all of which must satisfy the constraint that a presupposed path of implication be inferred between a proposition P from S , and a v proposition Q from S . Each type and the implication it requires is shown in Table 2.5, along with w correlated cue phrases. Table 2.5: [Keh95]’s Cause-Effect Relations Relation Result Explanation Violated Expectation Denial of Preventer Presupposition P Q Q P P Q Q P Conjunctions as a result, therefore, and because but despite, even though u xyu x yu u To take two examples, a Result relation is inferred when Q is recognized as normally following from P. In (2.14), being a politician normally implies being dishonest. (2.14) Bill is a politician, and therefore he’s dishonest. Denial of Preventer relations are inferred when P is recognized as normally following from x Q (example (2.15)). (2.15) Bill is honest even though he’s a politician. Kehler distinguishes six types of Resemblance relations, all having the constraint that a common or contrasting relation be inferred between S and S , such that subsumes p and p , where w € v w € v p applies over a set of entities a ,...a from S , and p applies over a set of entities b ,...b from  w w  v w v S . Certain Resemblance relations also have the constraint that a property vector be inferred, such w consists of common or contrasting properties q , which hold for a and b , for all C  % % %  that 2. Table 2.6 provides the constraints for each Resemblance relation and its correlated cue phrase. For example, Exemplification holds between a general statement followed by an example of the generalization. In (2.16), a and b correspond to the meanings of young aspiring politicians w Kehler notes that Elaboration relations are a limiting case of Parallel relations, where the similar entities a and b are identical. ‚ 13 ‚ w 2
    • ; while p and p correspond to the meanings of support and campaign for respectively3 . w v ‡ †„ ˆU…ƒ and Generalization is identical to Exemplification, except that the order of the clauses is reversed. (2.16) Young aspiring politicians often support their party’s presidential candidate. For instance, John campaigned hard for Clinton in 1992. Table 2.6: [Keh95]’s Resemblance Relations Constraints p =p ,a =b = p , b a or b a = p , a b or a b = p , q (a ) and q (b ) = p , q (a ) and q (b ) = p , q (a ) and q (b ) % % x % % % % %  %  …% % % v % % w % % w x v v % % w v % ‰ % w v ‰ …% w v % w p p p p p % Relation Elaboration Exemplification Generalization Parallel Contrast (i) Contrast (ii) Conjunctions in other words for example in general and but but Parallel relations require the relations expressed by the sentence and the corresponding entities to be recognized as sharing a common property. In (2.17), p and p correspond to the meanings w € v of organized rallies for and distributed pamphlets for respectively; corresponds to the meaning of do something to support. a and b correspond to the meanings of John and Bill, which share w w the common property q that they are people relevant to the conversation. Contrast relations re- w quire either the relations expressed by the sentences (example (2.18)) or the corresponding entities (example (2.19)) to be recognized as contrasting. (2.17) John organized rallies for Clinton, and Fred distributed pamphlets for him. (2.18) John supported Clinton, but Mary opposed him. (2.19) John supported Clinton, but Mary supported Bush. 2.2.4 Abducing Discourse Relations by Applying the Constraints Kehler’s constraints are formulated in terms of two operations from artificial intelligence: (1) identifying common ancestors of sets of objects with respect to a semantic hierarchy (Resemblance ’ ‘ ’ 14 is equatable with p , and p can be recognized as a “ 3 Although not discussed by Kehler, the subsuming property member of p .
    • relations), and (2) computing implication relationships with respect to a knowledge base (CauseEffects relations). Kehler distinguishes two steps in the discourse inference process: (a) Identify and retrieve the arguments to the discourse relation This step is achieved via the sentence interpretation; Kehler uses a formalism related to the version of Categorial Semantics described in [Per90], in which sentence interpretation results in a syntactic structure annotated with the semantic representation of each constituent. These semantic representations are arguments to the discourse relation, and are identified and retrieved via their corresponding syntactic nodes. Cause-Effect relations require only the identification of the sententiallevel semantics for the clauses as a whole (i.e. P and Q). Resemblance relations require that the semantics of sub-clausal constituents be accessed, in order to identify p and p , and a and b . % % w v (b) Apply the constraints of the relation to those arguments The second step, Kehler suggests, could be achieved for Resemblance relations using comparison and generalization operations such as proposed in [Hob90] and elsewhere, while [HSAM93]’s logical abduction interpretation method could be used to abduce the presupposition for the CauseEffect relations. [HSAM93]’s method could further determine with what degree of plausibility the constraints are satisfied such that a particular relation holds. In [HSAM93]’s framework, discourse relations between discourse units are proved (abduced) using world and domain knowledge, via a procedure of axiom application. Each discourse unit is a , as defined by axiom (2.20), where if is a sentence containing a string of words, , and d @ ˜ ‡F • ”F ™—–@ is its assertion or topic, then it is a discourse segment. f F (2.20) ( w, e)s(w, e) Segment(w, e) e When a discourse relation holds between two segments, the resulting structure is also a segment, yielding a hierarchical discourse structure, as captured by axiom (2.21), where if w and w are g w segments whose assertion or topic are respectively e and e , and a discourse (coherence) relation g w holds between the content of w and w , then the string w w is also a segment. The argument of F g w g w CoherenceRel is the assertion or topic of the composed segment, as determined by the definition of the discourse relation. 15
    • CoherenceRel(e , e , e) g w Segment(w , e ) f t g (2.21) ( w , w , e , e , e) Segment(w , e ) g t w g w g w w e Segment(w w , e) g w To interpret a discourse W, therefore, one must prove the expression: (2.22) ( e)Segment(W, e) h We use as an example a variant of that found in [Keh95]: (2.23) John is dishonest. He’s a politician. To interpret this discourse, it must be proven a segment, by establishing the three premises in axiom (2.21). The first two premises are established by (2.20), it therefore remains to establish a discourse relation. Because Explanation is a defined discourse relation, we have the following axiom: CoherenceRel(e , e , e ) g w w (2.24) ( e , e )Explanation(e , e ) f g g w w e In explanations, Hobbs notes, it is the first segment that is explained; therefore it is the dominant segment and its assertion, e , will be the assertion of the composed segment, i.e. the third argument w of CoherenceRel in (2.24). P be (e , e ) must be abduced, as expressed by g w F@ k j 6GVi presupposed; in Hobbs’ terms, the presupposition u Recall that the constraints defined by Kehler on Explanation relations were that Q the following axiom: Explanation(e , e ) g w ( e , e )cause(e , e ) f g g w w e In other words, to abduce an Explanation relation, what is asserted by e must proven to be the g cause of e . In [HSAM93], utterances, like discourse relations, are interpreted by abducing their w logical form, using axioms that are already in the knowledge base, are derivable from axioms in the knowledge base, or can be assumed at a cost corresponding to some measure of plausibility. Assume we have abduced the following axiom: cause(e , e ) w t ( e )Dishonest(e , x) g g g h f (2.25) ( x, e )Politician(e , x) w e w That is, if e is a state of x being a politician, then that will cause the state e of x being dishonest. g w The plausibility measure that is assigned to this formula will be inversely proportionate to the cost 16
    • assigned to an Explanation relation. Assuming (2.25) has a high plausibility in our knowledge base, then in the logical forms of the two sentences in (2.23), John (and he) can be identified with x, cause(e , e ) proven thereby, and Explanation will be viewed as the likely relation between the two g w sentences4 . 2.2.5 Interaction of Discourse Inference and VP Ellipsis [Keh95] shows how the discourse inference process for Resemblance relations interacts differently than the discourse inference process for Cause-Effect relations with VP ellipsis, based on the different constraints they require to be satisfied by the clauses they are inferred between. In particular, the arguments to Resemblance relations are sets of parallel entities and relations. Therefore, the discourse inference process must access sub-clausal constituents in identifying and retrieving those arguments, including the missing constituent in VP ellipsis. In contrast, the arguments to CauseEffect relations are propositions. Therefore the inference process need not access sub-clausal constituents. This difference accounts for different felicity judgments concerning VP ellipsis displayed across the two types of relations. To exemplify his analysis, consider example (2.26), in which a Parallel relation can be inferred between the two clauses: (2.26) Bill became upset, and Hillary did too. To establish a Parallel relation (see the Resemblance Relation definitions in Table 2.6), p(a ,a ...) g w must be inferred from S , and p(b ,b ...) must be inferred from S , where for some property vector m g w l , q (a ) and q (b ), for all . The identification of these arguments requires the elided material to be C % % % % E ‡ …–j  recovered reconstructed in the elided VP (see [Keh95] for details of the process of reconstruc- tion). Compare (2.26), however, with (2.27). (2.27) *The problem was looked into by Billy, and Hillary did too. Again, to establish a Parallel relation between the two clauses, the arguments must be identified, requiring the elided material to be recovered and reconstructed in the elided VP. But in this case the 4 This explanation is from [Keh95, 18] and [Lag98]. See [HSAM93] for further details 17
    • recovery of was looked into creates a mismatch of syntactic form when it is reconstructed in the elided VP, resulting in an infelicitous discourse. Such infelicity does not occur, however, when there is a Cause-Effect relation between two similar clauses, as in example (2.28): (2.28) The problem was to have been looked into, but obviously nobody did. Kehler argues that because establishing a Violation of Expectation relation (see the Cause-Effect Relation definitions in Table 2.5) requires only that a proposition P be inferred from S , and a propo- l x nu sition Q be inferred from S (where normally P Q), the elided VP need not be reconstructed m in the syntax, but can be recovered through anaphora resolution. The result is that the discourse is felicitous5 . 2.2.6 Summary In this section, we have seen the early delineation of different types of coherence proposed by Halliday and Hasan reflected in subsequent theories of discourse coherence, which we will see further below. The comparison of the set of propositional relations proposed by Halliday and Hasan with those proposed in other descriptive theories highlights the lack of agreement in the literature about how an important aspect of discourse coherence should be described. As we will continue to see in the following sections, though most models make use of explicit signals to characterize discourse relations, there still exists considerable variation in the number and type of discourse relations each model defines. What distinguishes each model is the degree and manner with which they associate their postulated set of discourse relations to mechanisms that produce them and how they constraint the application of these mechanisms. Kehler’s attempt to define relations between discourse units in terms of constraints which those units must satisfy, to demonstrate how their satisfaction can be determined using the logical abduction method of Hobbs et al., and to show how this satisfaction interacts with sub-clausal coherence, is a first exemplification of such an association. We will see others below, and in the final section we will see a way in which these various approaches can be 5 Kehler does not address the fact that (2.26) is infelicitous with a Cause-Effect relation, e.g. The problem was looked into by Billy, but Hillary didn’t. 18
    • simplified. 2.3 2.3.1 A Three-Tiered Model of Discourse The Three Tiers [GS86] present a theory of discourse that distinguishes three interacting components: the linguistic structure, the intentional structure, and the attentional state. The linguistic structure represents the structure of sequences of utterances, i.e. the structure of segments into which utterances aggregate. This structure is not strictly compositional, because a segment may consist of embedded subsegments as well as utterances not in those subsegments. This structure is viewed as akin to the syntactic structure of individual sentences ([GS86], footnote 1), although the boundaries of discourse segments are harder to distinguish6 . The intentional structure represents the structure of purposes, (DSPs), i.e. the functions of each discourse segment, whose fulfillment leads to the fulfillment of an overall discourse purpose (DP). DPs and DSPs are distinguished from other intentions by the fact that they are intended to be recognized. Non-DP/DSP intentions, such as a speaker’s intention to use certain words, or impress or teach the hearer, are private, i.e. not intended to contribute to discourse interpretation. Examples of DPs and DSPs include intending the hearer to perform some action, intending the hearer to believe some fact, intending the hearer identify some object or property of an object. As these examples imply, the set of intentions that can serve as DSPs and DPs is infinite, although it remains an open question of whether there is a finite description of this set. However, [GS86] argue that there are only two structural relations which can hold between DSPs and their corresponding discourse segments. If the fulfillment of a DSP A provides partial fulfillment of a DSP B, then B dominates A. If a DSP A must be fulfilled before a DSP B, then A satisfaction-precedes B. Because a hearer cannot know the whole set of intentions that might serve as DSPs, what they recognize, [GS86] argue, is the relevant structural relations between them. The attentional state is viewed as a component of the cognitive state, which also includes the 6 See [GS86, FM02] for references to studies investigating these boundaries. 19
    • knowledge, beliefs, desires and private intentions of the speaker and hearers. The attentional state is inherently dynamic, and is modeled by a stack of focus spaces, each consisting of the objects, properties and relations that are salient in each DSP, as well as the DSP itself. Changes in the attentional state arise through the recognition of the structural relations between DSPs. In general, when the DSP for a new discourse segment contributes to the DSP for the immediately preceding segment, it will be pushed onto the stack; when the new DSP contributes to some intention higher in the dominance hierarchy, several focus spaces are popped from the stack before the new one is pushed. One role of the stack is to constrain the possible DSPs considered as candidates for structural relations with the incoming DSP; only DSPs in the stack and in one of the two structural relations are available. Another role of the stack is to constrain the hearer’s search for possible referents of referring expressions in an incoming utterance; the focus space containing the utterance will provide the most salient referents. Figure 2.2 illustrates the major aspects of the model. Figure 2.2: Illustration of [GS86]’s Discourse Model In the left of the figure, a sequence of five utterances is divided into DSs, where DS1 includes both DS2 and DS3, as well as Utterance1 and Utterance5, which are not included in either DS2 or DS3. As shown in (a), the focus space FS1 containing DSP1 and the objects, properties and relations so far identified in DS1 is pushed on the stack. Because DSP1 is identified as dominating DSP2, FS2 is also pushed onto the stack. In (b), DSP2 is identified as being in a satisfaction-precedes relationship with DSP3; FS2 is thus popped from the stack before FS3 is pushed onto the stack. [GS86] argue that the hearer makes use of three pieces of information when determining the 20
    • segments, their DSPs, and the structural relationships between them. First, linguistic expressions, including cue phrases and referring expressions as well as intonation and changes in tense and aspect, are viewed as primary indicators of discourse structure, even as the attentional structure constrains their interpretations. [GS86] argue that while linguistic expressions cannot indicate what intention is entering into focus, they can provide partial information about changes in attentional states, whether this change returns to a previous focus space or creates a new one, how the intention in the containing discourse segment is related to other intentions, and structural relations between segments. They exemplify such uses of linguistic expression as shown in Table 2.7. Table 2.7: [GS86] Changes in Discourse Structure Indicated by Linguistic Expressions Attentional Change True Interruption Flashbacks Digressions Satisfaction-precedes New Dominance (push) (pop to) (complete) now, next, that reminds me, and, but anyway, but anyway, in any case, now back to the end, ok, fine, paragraph break I must interrupt, excuse me Oops, I forgot By the way, incidentally, speaking of Did you hear about..., that reminds me in the first place, first, second, finally moreover, furthermore for example, to wit, first, second, and moreover, furthermore, therefore, finally Second, the hearer makes use of the utterance-level intentions of each utterance [Gri89] to determine the DSP of each discourse segment. The DSP may be identical to some utterance-level intention in a segment, as in a rhetorical question, whose intention is to cause the hearer to believe the proposition conveyed in the question. Alternatively, the DSP may be some combination of the utterance-level intentions, as in a set of instructions, where the intention of the speaker is that all of them be completed. Third, shared knowledge between the speaker and hearer about the objects and actions in the stack can help determine the structural relations between utterances and the intentions underlying them. [GS86] propose two relationships concerning objects and actions that a hearer uses. A supports relation holding between propositions may indicate dominance in one direction, while 21
    • a generates relation holding between propositions may indicate dominance in another direction. They leave as an open question how these relations between objects are computed, but view them as more basic versions of the possible relations between propositions proposed by [HH76] and others. Together, this information enable a hearer to reason out the DSPs and DP in a discourse. 2.3.2 Coherence within Discourse Segments Within each discourse segment, Centering Theory (CT) [WJP81] is a model of sub-clausal discourse coherence which tracks to the movement of entities through each focus state by one of four possible focus shifts. In CT, each discourse segment consists of utterances designated as U . Each utterance % in U is the backward-looking center, Cb. The highest-ranked E F sCA j F –5rqpo ) that is % ˜F "@ w D% h % entity in Cf(U of discourse entities, the forward-looking centers, Cb(U ). The highest-ranked % U evokes a entity in Cf(U ) is the preferred center, Cp. The realize relation is defined in [WJP81] as follows: % is an element of the situation described by U, or i if i i As utterance U realizes a center is the semantic interpretation of some subpart of U. Ranking of the members of the Cf list is language-specific; in English the ranking is as follows: t Indirect Object Direct Object Other t t Subject Four types of transitions are defined to reflect variations in the degree of topic continuity and are computed according to Table 2.8: Table 2.8: Centering Theory Transitions uv % ) Cb(U ) Cb(U Smooth-Shift Rough-Shift w D% h w D% h % Cb(U ) = Cp(U ) Cb(U ) Cp(U ) Cb(U ) = Cb(U Continue Retain ) % % % uv % Discourse coherence is then computed according to the following transition ordering rule: Continue is preferred to Retain, which is preferred to Smooth-Shift, which is preferred to Rough Shift. CT models discourse processing factors that explain the difference in the perceived coherence of discourses such as (2.29) and (2.30). 22
    • (2.29a) Jeff helped Dick wash the car. (2.29b) He washed the windows as Dick waxed the car. (2.29c) He soaped a pane. (2.30a) Jeff helped Dick wash the car. (2.30b) He washed the windows as Dick waxed the car. (2.30c) He buffed the hood. CT predicts that (2.30) is harder to process than (2.29), because though initially in both discourses the entity realized by Jeff is established as the Cb, utterance (2.30c) causes a Smooth-Shift in which the Cb becomes the entity realized by Dick, because the verb buffing is a subset of the waxing event. The predicted preference for a Continue (which actually occurs in (2.29c)) means that the 2.3.3 w D% h hearer first interprets the pronouns he in (2.30c) as the Cp(U ) and then revises this interpretation. Modeling Linguistic Structure and Attentional State as a Tree [Web91] argues that a tree structure and insertion algorithm can serve as a formal analogue of both on-line recognition of discourse structure and changes in attention state, thereby removing the need to postulate a separate stack for focus spaces, while retaining the distinction between text structure, intentional structure, and attentional state. Webber’s model assumes a one-to-one mapping between discourse segments and tree nodes, with a clause constituting the minimal unit. In this way the linguistic structure is represented compositionally. Each node in the tree is associated with the entities, properties and relations conveyed by the discourse segment it represents. When the information in a new clause C is to be incorporated into an existing discourse segment DS, C is incorporated into the tree by the operation of attachment, which adds the C node as a child of the DS node, and adds the information conveyed by C to the DS node. This operation is illustrated in Figure 2.3. (a) shows the tree before node 3 is attached, while (b) shows the tree after node 3 is attached. Note that the information associated with node 3 is represented in node 3 and incorporated into the discourse segment (1,2,3) it has attached to. When the information in a new clause C is combined with the information in an existing discourse segment to compose a new discourse segment DS, C is incorporated into the tree by the 23
    • Figure 2.3: Illustration of [Web91]’s Attachment Operation operation of adjunction, which makes C and DS the children of a new node, and adds the information conveyed by C and DS to the new node. This operation is illustrated in Figure 2.4. (a) shows the tree before node 3 is adjoined, while (b) shows the tree after node 3 is adjoined. Note that the information associated with node 3 is incorporated along with the information associated with node (1,2) (which was also created by adjunction) into the new node (1,2),3). Figure 2.4: Illustration of [Web91]’s Adjunction Operation Both of these operations are restricted to applying to nodes on the right frontier of the discourse tree. Formally, the right frontier is the smallest set of nodes containing the root such that whenever a node is in the right frontier, so is its rightmost child. In this way, the tree nodes appear in the same linear order as the corresponding segments in the text. In Webber’s model, the tree replaces [GS86]’s linguistic structure, and the right frontier replaces [GS86]’s attentional state, i.e. the information in each node on the right frontier represents the information in each focus space in the stack. Because the model is strictly compositional, not 24
    • all nodes (discourse segments) in the tree will contain discourse segment purposes (DSPs) (e.g. Utterance1 and Utterance5 in Figure 2.2); however, all nodes on the right frontier except possibly the leaf will contain DSPs that contribute to the DP of the overall discourse (which will be contained in the root of the tree.) 2.3.4 Introduction to Discourse Deictic Reference [Lak74] first used the term“discourse deixis” to refer to uses of the demonstrative like those in (2.31) - (2.33), where the antecedent of the demonstrative can be the interpretation of a verbal predicates (2.31), the interpretation of a clause (2.32), or the interpretations of more than one clause (2.33). (2.31) John [smiled]. He does that often. (2.32) [John took Biology 101.] That means he can take Biology 102. (2.33) [I woke up and brushed my teeth. I went downstairs and ate breakfast, and then I went to work.] That’s all I did today. Early studies of this phenomena relate it to another use of demonstratives shown in (2.34), where the antecedent is not in the discourse at all, but rather in the spatio-temporal situation. This use is called “deictic”, a Greek term meaning ‘pointing’ or ‘indicating’. (2.34) “Aw, that’s nice, Billy!”, you exclaim, when your two-year old kisses you. In [Lyo77]’s view, discourse deixis achieves higher-order reference, where first-order reference is defined as reference to NPs, and higher-order reference is defined as reference to larger constituents interpreted as events, propositions and concepts. [Web91] distinguishes five discourse deixis interpretations, shown in Table 2.9, and exemplified in the second column, where for illustrative purposes the discourse deictic should be assumed to refer to an interpretation of the clause “John talks loudly”. Demonstratives are most commonly employed in English for discourse deixis purposes. Corpus studies, however, have shown the zero-pronoun used in Italian [DiE89] and German [Eck98], and occasionally in English speech. [Sch85] studies roughly 2000 tokens of it and that, and finds that it is much less frequently 25
    • Table 2.9: [Web91]’s Classification of Discourse Deictic Reference Interpretation hline speech act proposition event pure textual description Example that’s a lie that’s true that happened yesterday repeat that that’s a good description used than that as a discourse deictic, and that when uses of discourse deictic it do occur, they are frequently used after a discourse deictic use of that, in what Schiffman calls a “Pronoun Chain”. A similar observation is made by [Web88]. [GHZ93] note more generally the tendency for it to prefer reference to focused items, while demonstrative pronouns prefer reference to activated items. For example, in (2.35), Both uses of it refer to “becoming a street person”; by the second reference, this property is focused. that prefers referring to “becoming a street person would hurt his mother”, which is not yet focused, and is highly dispreferred as the referent for the second it. (2.35) John thought about becoming a street person. It would hurt his mother and it/that would make his father furious. The oft-cited example in (2.36) shows what [GC00] and [Byr00] relatedly claim, that personal pronouns tend to refer to entities denoted by noun phrases, while demonstratives tend to refer discourse deictically. In (2.36), the referent of it is clearly “x”, while the referent of that is clearly the result of “add x to y”. (2.36) Add x to y and then add it/that to z. The preference of it to refer to entities denoted by noun phrases and to refer to abstract objects only after they are referred to by a demonstrative suggests that nouns are more salient than verbs and clauses as entities. [Byr00] however, notes that the salience effects on personal pronoun resolution can be affected by what she calls “Semantic Enhancement”: with enough predicate information geared toward a higher order referent, personal pronouns can made to prefer higher order referents, 26
    • as shown in (2.38c). (2.37) There was a snake on my desk. (2.38a) It scared me. (2.38b) That scared me. (2.38c) I never thought it would happen to me. (Sem. Enh) [Eck98] notes a further difference between the resolution of demonstratives and personal pronouns as discourse deixis, which may indicate that topics are more salient than verbs and clauses as entities. In (2.39), that prefers reference to the specific story described by Speaker A, while it prefers reference to the topic of child-care in general7 . In fact, [ES99] does not consider this use of it a discourse deictic use at all; they treat it as a “vague pronoun”. Speaker A: She has a private baby-sitter. And, uh, the baby just screams. I mean, the baby is like seventeen months and she just screams. (2.39) Even if she knows that they’re getting ready to go out. They haven’t even left yet... Speaker B: Yeah, it/that’s hard. [Lad66] and others note subtle salience differences between the discourse deictic uses of this and that, related to their spatio-temporal differences: this is used when the referent is close, and that is used when the referent is far. 2.3.5 Retrieving Antecedents of Discourse Deixis from the Tree Many researchers find that discourse deictic reference is dependent on discourse structure. [Pas91] uses (2.40) to show that the clausal referent of a discourse deictic is only available if it immediately precedes the deictic. In (d), that cannot refer to sentence (a) unless (b) and (c) are removed. (2.40a) Carol insists on sewing her dresses form all natural materials (2.40b) and she won’t even consider synthetic lining. (2.40c) She should try the new rayon challis. (2.40d) *That’s because she’s allergic to synthetics. 7 [GC00] also claim that prosody plays a role in resolving discourse deictic that more than it. 27
    • [Web91] argues more formally that though deictic reference is often ambiguous (or underspecified [Pas91]), the referent is restricted to the right frontier of the growing discourse tree. She exemplifies this using (2.41)-(2.42): (2.41a) It’s always been presumed that (2.41b) when the glaciers receded (2.41c) the area got very hot. (2.41d) The Folsum men couldn’t adapt, and (2.41e) they died out. (2.42) That’s what’s supposed to have happened. It’s the textbook dogma. But it’s wrong. The discourse deictic reference in (2.42) is ambiguous; it can refer to any of the nodes on the right frontier of the discourse: (the nodes associated with) clause (2.41e), clauses (2.41d)-(2.41e), clauses (2.41c)-(2.41e), clauses (2.41a)-(2.41e). Discourse deictic ambiguity extends to within the clause as well [Sch85], [Sto94]. For example, in (2.43), the referent of that could be any of the bracketed elements: (2.43 a) [ It talks about [ how to [ go about [ interviewing ]]] (2.43b) and that’s going to be important. As noted by [DH95], the standard view on anaphoric processing is that we “pick up” the interpretation of the antecedent, and that in the normal case, there is a coreference relation between the antecedent and the anaphor. The coreference relation is one of identity, and the antecedent is “there”, waiting to be “picked up”. Thus, in (2.44), my grandfather is said to be coreferent to he: (2.44) My grandfather was not a religious person. He even claimed there was no god. However, the fact that the interpretations of discourse deixis are not grammaticalized as nouns prior to discourse deictic reference, and the fact that there are structural restrictions on their reference, leads some researchers to argue that they are not present as entities in the discourse model prior to discourse deictic reference. According to these researchers, their entity reading is coerced and added to the discourse model via discourse deictic reference. 28
    • Type coercion is a term taken from computer science, where it defines an operation by which an expression which is normally of one logical type is re-interpreted as another (e.g. when an integer is understood as a Boolean value). Type coercion is used to explain a range of linguistic phenomena, such as when an expression which is indeterminate as to logical type is ’coerced’ into one particular interpretation and thus acquires a fixed type. Models of how coercion is achieved vary. [Web91] argues that deictic use is an ostensive act, that distinguishes what is pointed to and what is referred to, which may be the same, but need not be. This ostensive act, functions to reify, or bring into the set of entities, some part of the interpretations of clauses which were not present in the set of entities prior to the ostensive act. She uses referring functions8 to model how the reification is achieved, because they allow the domain of what is pointed to (demonstratum) to be distinguished from the range of what is referred to (referent): u f: D R , where D is comprised of focused regions of the discourse, and R is a set of possible interpretations. In (2.41), the domain of the referring functions are the elements at the right frontier of the discourse, and function application yields a range of event tokens (things that can happen). By virtue of the referring action of the function, these new ‘entities’ (event tokens) are added to E. [Sto94] takes Webber’s model one step further, arguing that a discourse deictic pronoun will take its referent from the rightmost sibling of the clause in which it is contained, once its clause is attached or adjoined to the tree. That referent cannot be found in a node that dominates the node containing the discourse deictic is easy to see, because that would make the deictic self-referential, as in (2.45), where the indice indicates the discourse segment whose interpretation is the referent C of the discourse deictic. As the example makes clear, a discourse deictic cannot almost never refer to a segment in which it is contained. The only exception is textual deixis, as in (2.46), where the demonstrative can refer to the text in which it is contained. is a true sentence.] % @C † % 5yxw 8 @C † % 5yxw (2.46) [ is a neat idea.] % (2.45) *[ Referring functions have been used by [Nun79] to model how nouns in general achieve their reference. 29
    • To argue that the referent will not be found in a node that is dominated by the node containing the discourse deictic, [Sto94] first evokes the use of discourse relations, arguing that if a discourse deictic refers to a segment, it will also be in a discourse relation with that segment. He then argues that while discourse deictic reference to embedded clauses might be viewed as an exception to this generality, this exception can be avoided by replacing Webber’s use of referring functions with a possible world semantics in which the semantic interpretation of the elements at the right frontier of the discourse make a variety of ’entity’ interpretations, or “information states”(see [Kra89]), available to the discourse deictic. For example, he argues that modality in (2.47) makes available assertions about at least two information states: (1) Mary left, and (2) John thought the context asserted of (1). The discourse deictic in (2.48a) references the first information state, and that in (2.48b) references the second information state. (2.47) John thought Mary left. (2.48a) He thought this happened yesterday. (2.48b) This was wrong. [DH95] take a view similar to [Web91], except they argue that type coercion is just one of the possible referent-creating operations evoked by the use of a discourse deictic. They argue that each time an anaphor is used, the degree to which its antecedent is “there” will vary, and the effort needed to “pick it up” will vary. In their view traditional “coreference” as the most trivial case: the result of applying the identity relation to the antecedent’s extension. They propose that at least the following operations are needed to explain how the referent of a discourse deictic is created: Summation and complex creation: These operations assemble sets. A set can be assembled by logical conjunction, as in (2.49), or by other discourse relations, as in (2.50) (brackets indicate the discourse where the operation creates the antecedent): (2.49) [Interest rates rose. The recession may reduce inflation. Capital taxation is lower.] This means brighter times for those who have money to save. (2.50) [If a white person drives this car it’s a “classic”. If I, a Mexican-American, drives it, it 30 r
    • is a “low-rider”. ] That hurts my pride. r Type coercion: This operation is as above, when the semantics of an element in the clause containing the deictic causes an expression to be coerced into one particular interpretation. For example, the verb can coerce an interpretation, as in (2.51) where “happen” coerces an event interpretation, or the predicate nominal can coerce an interpretation, as in (2.52). (2.51) Mary was fired. That happened last week. (2.52) I turned left. This was a wise decision. Abstraction and Substitution: r The abstraction operation abstract away from specific events, as in (2.53), where the antecedent is “beating one’s wife” not “Smith’s beating his wife”, while the substitution operation substitutes one element of the antecedent for another, as in (2.54), where the antecedent is “X beats his wife”: (2.53) Smith beats his wife although this was forbidden 50 years ago. (2.54) Smith beats his wife and John does it too. Regardless of whether we assume that clauses already make available a set of semantic values, or whether we use a referring function or one of any number of operations to represent how these values are made available, discourse deixis use doesn’t determine which entity interpretation(s) is (are) chosen as the referent. Within the domain of the right frontier, the semantics of the clause containing the discourse deictic will determine which of the available objects are selected. In particular, as [Ash93] notes, the sub-categorization frame of the verb should restrict the possible referents. So while the embedded clause in (2.55) can be interpreted as a variety of abstract objects, thinks sub-categorizes for a proposition interpretation of “Mary is a genius”, as does the complex form be certain of. Similarly, happen sub-categorizes for an event interpretation, surprise sub-categorizes for a fact interpretation. (2.55) John thinks that [Mary is a genius]. John is certain of it. 31
    • 2.3.6 Summary In this section, we overviewed the three-tiered model of discourse of Grosz and Sidner, in which three interacting components account for the structuring of the text into segments, each of which serves a purpose and creates a salient space containing the information relevant to that segment. Their focus on intentions as the “relations” linking discourse segments reflects a more general pragmatic approach to discourse. In some treatments discourse is viewed as a plan, structured into subgoals whose fulfillment achieves an overall goal (see [LA90] for references). Other treatments concern the role of discourse segments in argument understanding[Coh84]. Relevance Theory [SW86] is another pragmatic approach proposed as a model of a hearer’s interpretation process, in which the relevance of every segment to the context is determined based on a number of interacting constraints. Current work in dialogue (c.f. [SIG02]) investigates intentions in terms of “dialogue acts”. Because Grosz and Sidner view the possible intentions that can underlie a discourse as infinite, structural relations between segments play an important role in discourse coherence. By distinguishing a focus space for each segment they model sub-clausal coherence across segments, and enable Centering Theory to model sub-clausal coherence within segments 9 . Webber’s tree-based model simplifies the Grosz and Sidner model by combining the components of text structure and attentional state into a single structure, while keeping them conceptually distinct via the notion of a right frontier, which is shown capable of modeling constraints on discourse deictic reference to the interpretation of discourse segments. Grosz and Sidner’s model (and Webber’s revision) provides a detailed account of the highlevel structuring of text in terms of attention and intention; relations between propositions play a subsidiary role, serving along with cue phrases and utterance level intentions to help the reader recognize the writer’s intentions for each segment. The details of how propositional relations are computed recursively to build discourse segments and how the results are represented are not provided. In the next section, we will discuss a tree-based model that adopts a similarly simple view of propositional relations, but defines their role in the construction of the tree precisely, and claims to model sub-clausal anaphora resolution. 9 Some researchers have also used Centering as a model of anaphora resolution (see [BWFP87]) 32
    • 2.4 A Tree Structure with a Syntax-Semantic Interface [Pol96] presents a context free grammar (LDM) for incremental discourse parsing [SP88] similar to [Web91], but combined with grammar rules incorporating propositional relations and a dynamic logic framework (DQL) for describing the structured semantic component that results from the parsing process [PSvdB94]. The resulting model provides an account of how anaphora resolution [vdB96, PvdB99, Lag98] and temporal interpretation [PvdB96] work across stretches of discourse. 2.4.1 Constituents and Tree Construction In LDM, the surface structure of discourse is composed of discourse constituent units (DCUs) and discourse operators (DOs). DCUs are semantically motivated structures that carry propositional information; an elementary DCU, typically a clause or sentence, is any minimal utterance encoding a single event or state of affairs indexed for context, including physical and social situation of utterance (real or modeled), genre unit, modality, polarity, and point of view. DOs express nonpropositional information, such as semantic and structural relationships among DCUs, and pragmatic information about the attitude of the speaker and the situation of utterance. Examples of DOs include logical operators, vocatives, (dis)affirmations, certain particles, exclamations, connectives, and temporal modifiers. Complex DCUs are defined recursively, via the attachment of elementary DCUs to DCUs the growing parse tree to create one of three types of structures: r coordinations, including lists such as topic chains and narratives r subordinations, including elaborations and interruptions r binary-attachments, including adjacency pairs, logical relations and rhetorical structures LDM can be viewed as a limited lookahead parser which accepts elementary DCUs as input and builds simultaneously a structural and semantic discourse representation [SP88]. A discourse is represented as an open right discourse parse tree, composed of C (coordination) S (subordination) 33
    • and B (binary-attachment) non-terminal nodes and elementary DCUs as terminal nodes. Each elementary input DCU is attached as the right child of an available, existing or newly created, node. As in [Web91], only DCUs on the right edge of the growing parse tree are available for the attachment of an incoming DCU; the operations for attaching an incoming DCU correspond to [Web91]’s attachment and adjunction operations [Gar97b]; right attachment is exemplified in Figure 2.5. Figure 2.5: LDM Right-Attachment Operation 2.4.2 The Syntax Semantic Interface Simultaneous with the incremental construction of discourse structure, the semantic representation of the discourse is updated with the interpretation of the incoming DCU. LDM is in essence a typed unification based sentence grammar augmented with a set of discourse grammar rules. Each DCU contains semantic information in the form of typed feature structures, where types are ordered along a type hierarchy which allows for information inheritance and type unification10 . For example, the feature structure for the basic DCU John smiled is shown in (2.56), where basic represents the type | {z q'G@ of elementary DCUs and represents the semantics of John smiled. The SCHEMA feature is identical to the SEM feature in basic DCUs. } 8ƒƒ‚ € 5~ iC@ 5j ~ SEM SCHEMA 10 | {z G'q@ „ G'q@ | {z (2.56) Additional information is also contained in each DCU, as mentioned above; see [Pol96] for details. 34
    • Discourse grammar rules then specify how to combine DCUs into bigger DCUs, and how the feature structures of the child constituents combine to yield the semantics of the parent constituent. DOs signal the application of these grammar rules. For example, construction of a coordination structure can be illustrated by the grammar rule for a list, stated as in (2.57). In [PSvdB94], coordinating conjunctions and, or,... signal that a list is to be constructed or extended. † gGj‹•Y"i ‡ F † •F g ‡ SEM SCHEMA „ FF o˜ gppE ~ ~ } 8ƒƒ‚ „ w j‹•"e"i ‡ F † w •F ˆ‡ F F o˜ Bp5–E ~ SEM SCHEMA } ~ u } 8ƒƒ‚ | g •F ‡ • F ‡z ‡F g‹"7‰ w RŠ"q” „ | g 7"ˆ"R"rA • F ‡‰w •F ‡z˜ @C ˜@C rA ~ SEM SCHEMA , 8ƒƒ‚ … (2.57) ~ Syntactically, this rule states that any two discourse trees can combine to form a new tree of type list. Semantically, this rule produces a list relation between the two trees (indicated by the feature SEM), and constrains this relation to be between two trees whose generalization is non- g •F ‹"Œ‡ and w "‡ •F w "Œ‡ •F subsumes both and g •F ‹‡ trivial, where the generalization of two terms yields the most specific term which . The SCHEMA feature indicates this term. For example, given the list John smiled and Phil cried, the value of the SCHEMA feature will be the generalization of the two clauses, roughly man expressed emotion, where man and expressed emotion are set of objects sharing a common property 11 . Note that this rule extends a list relation when the generalization of . g •F ‹‡ In [Lag98]’s version of LDM, the rule for constructing A jiC o „˜F † q9UYpo w ‹Y"i ‡ j •F † is coordinations (e.g. a binary- attachment) is very similar to the rule for lists, except that a (non-trivial) generalization is not required. Rhetorical coordinations are signaled by DOs such as therefore, so, thus, accordingly. The construction of subordinations, which may be signaled by DOs such as because, since differ from coordinations in that no generalization is calculated. The SCHEMA and SEM features of the parent DCU in subordination structures are given the values of the main clause. 11 see [PSvdB94, vdB96] for the formal computation of SEM and SCHEMA in DQL. 35
    • 2.4.3 Retrieving Antecedents of Anaphora from the Tree LDM claims to model constraints on the antecedents of anaphora of incoming DCUs, as illustrated in example (2.58), taken from [PvdB99]. (2.58a) Susan came home late yesterday. (2.58b) Doris had held her up at work. (2.58c) She didn’t even have time for dinner. In (2.58), the relationship between DCU (a) and DCU (b) is a subordination because DCU (b) supplies more detailed information about why Susan came home late. DCU (c) continues describing the state of affairs of Susan’s evening, and is therefore in a coordination relation with (a). Due to the specification of the semantics at each node that is provided in the LDM grammar rules for subordination and coordination, only Susan is available as a potential referent for the anaphor, she, in (c). The tree for this discourse is shown in Figure 2.6, where the feature structures have been simplified to show only the type of DCU and the available antecedents. Ž  Ž Ž € € iC @ 9gqj iC @ 9gqj  ‘ ‘  ‘ ‘ ’ ’ ’ ’ Ž € € ‡ „C˜ j ‡C E o k 9G…Rpb„ g@ i 9Cg@qj              Susan (a) Susan Susan (c) Ž ‡ „C˜ j ‡C E o „ „ 9G…RpbUVi Susan (b) Susan , Doris Figure 2.6: LDM Tree Structure for Example (2.58) 2.4.4 The Need For Upward Percolation [Gar97b] observes that while the LDM discourse grammar rules specify how the semantics of child DCUs combine to yield the semantics of the parent DCUs, the semantics of the rest of the tree remains unchanged. She argues that there are two main problems with this lack of “upward percolation”. First, the semantics of the discourse cannot be read off either the root or the right frontier of 36
    • the tree. That it cannot be read off the root is relatively obvious; that it cannot be read off the right frontier is illustrated using (2.59). The LDM tree associated with this example is shown in Figure 2.7; where only the semantics of each DCU is shown at each node for simplicity. (2.59a) We were going to see our son tonight (2.59b) but we are not (2.59c) because the younger one is coming home for dinner (2.59d) because he is working in the neighbourhood (2.59e) so he is coming home for dinner (2.59f) so we are not a but b —— ˜˜ F € •• • – –– ‘‘ ‘ ’ ‘ ’ ’ ’ ““ “ “ ”” “ ” ” ” j “ “ “ ” “ “ ” ” ” ” ((b because c) so f ) (b because c) f ((c because d) so e) (c because d) E i Figure 2.7: LDM Tree Structure for Example (2.59) If the semantics of the discourse are read off the right frontier and conjoined, we get only: ((a but b) and ((b because c) so f) and f). The second major problem [Gar97b] notes with LDM’s lack of upward percolation is an inability to retrieve the antecedents of discourse deictic reference. As [Web91] argues, the right frontier should represent the available antecedents of discourse deixis; obviously if the right frontier does not contain the required information, it will not be available to the discourse deictic. Gardent proposes an alternative method of tree construction and a specification of the syntaxsemantic interface in which upward percolation is incorporated, along with the discourse grammar rules of LDM. We will discuss her approach in detail in Chapter 4. [Sch97]’s extension of her approach incorporates a semantic-pragmatic interface. 37
    • 2.4.5 Summary In summary, LDM provides a precise representation of discourse structure, a formal representation of discourse semantics, and a specific calculation of the information available at intermediary nodes. The authors argue that for this reason the LDM model has an advantage over [GS86], which relies on the inference of attentional and intentional states. However, the LDM model makes no reference to intentions at all, and the inference process is not described, although they do note that appeals to inference and world knowledge are restricted to specific moments in the interpretation, i.e. the moment of DCU attachment to the discourse tree. Moreover, while LDM, like [GS86], claims to provide an account of anaphora resolution, Gardent shows that the lack of upward percolation makes it unable to account for the resolution of discourse deixis anaphora. In the next section, we will discuss a different tree-based model which, though it returns to a descriptive approach of discourse coherence, is widely used, because in addition to providing an extensive description of propositional relations, it also defines the discourse structures that can be produced with them. We will then look at an alternative theory which replaces syntactic structure with a structured semantics, and models world knowledge and the inference of propositional relations. LDM argues this model is less tractable, because it does not separate the syntax and semantics of discourse, and reference to world knowledge and inference is less restricted. The theory goes further than LDM however, by presenting an account of a separate component of intentions. 2.5 2.5.1 A Descriptive Theory of Discourse Structure Analyzing Text Structure Rhetorical Structure Theory (RST) [MT88] is one of the simplest models of discourse, in that it is a purely descriptive theory of text organization. RST describes text structure from the point of view of a text analyst, who has access to the text, knowledge of context and the cultural conventions of the writer, but does not have access to the writer. Therefore, the analyst’s job is to judge the most plausible relations that the writer intended to convey. This judgment does not rely on morphological or syntactic signals; the authors claim to have found no reliable unambiguous signals for any 38
    • relations. Recognizing intended relations is to rest on functional and semantic judgments alone. The analyst chooses from the list of RST relations in Table 2.10. In this table, relations are grouped according to their similarity in definition; the authors acknowledge that alternative groupings are possible, one of which is the distinction between “subject matter” relations, whose intended effect is that the reader recognizes the relation, and “presentational” relations, whose intended effect is to increase some inclination in the reader, such as the desire to act or believe some assertion. “presentational” relations are italicized. Table 2.10: Organizations of RST Relation Definitions Evidence and Justify Evidence Justify Relations of Cause Volitional Cause Non-Volitional Cause Volitional Result Non-Volitional Result Purpose Circumstance Solutionhood Antithesis and Concession Antithesis Concession Condition and Otherwise Condition Otherwise Interpretation and Evaluation Interpretation Evaluation Elaboration Restatement and Summary Restatement Summary Background Enablement and Motivation Enablement Motivation Other Relations Sequence Contrast Nuclearity is assumed to be a central organizing principle of text; for the majority of relations, @ k FA i k ¥"Pe™‡ the pieces of text being related can be distinguished into a and satellite, with the nucleus representing the writer’s main communication, and the satellites providing subsidiary information. The prediction is that if the nucleus is removed, the significance of the information in the satellite(s) will not be apparent and therefore the text will be incoherent, but if the satellite is removed, the resulting text will still be coherent and resemble the original in the form of a “synopsis”. RST relations are thus defined in terms of nuclearity and the writer’s intent. As an example, the definition of the Evidence relation is given in Table 2.11, where R represents the reader, W represents the writer, N represents the nucleus, and S represents the satellite. The text analyst uses RST relations to relate text spans. Atomic text spans are generally clauses, except that clausal subjects, complements and restricted relative clauses are not treated in [MT88] 39
    • as independent text spans. Table 2.11: Evidence: RST Relation Definition relation name constraints on N constraints on S constraints on N+S the effect EVIDENCE R might not believe N to a degree satisfactory to W R believes S or will find it credible R’s comprehending S increases R’s belief of N R’s belief of N is increased Complex text spans are structures called schema applications. A schema application is a set of adjacent text spans (atomic or complex) linked by an RST relation according to one of five structural arrangements, called schemas. Each relation has a corresponding schema, exemplified in Figure 2.8; the relations not shown all use the schema labeled with the “circumstance” relation. Arcs represent the relation holding between text spans, which are represented by horizontal lines. The nucleus is distinguished from the satellite by the direction of the arrow, and each vertical line descends from the text span being decomposed by a schema application down to the nucleus of the schema application. Figure 2.8: RST Schemas Schemas do not constrain the ordering of nucleus or satellite, they allow a relation that is part of a schema to be applied any number of times, and in multi-relation schemas, they require only one of the relations to hold. However, a number of constraints must be satisfied to produce a valid RST structure. An RST structure must be complete, consisting of a set of schema applications containing 40
    • a schema application that constitutes the entire text. Every text span, except for the entire text itself, must be connected as either an atomic span or a constituent of another schema application. Each schema application must be unique, consisting of a different set of text spans, and in a multi-relation schema, each relation must apply to a different set of text spans. Finally, adjacency must be satisfied, in that the result of each schema application constitutes one text span. An example of an RST structure consisting of two evidence relations is shown in Figure 2.9. Each text span is numbered; atomic numbers correspond to the text units in (2.60), and complex numbers represent undecomposed units of the structure. (2.60) (unit ) This computer tax program really works. w (unit ) In only a few minutes I finished my tax return. g (unit ) I printed it for you to see. ™ Figure 2.9: Evidence Relation [MT88] allow that multiplicity may arise from text ambiguity and resulting differences in analysts’ judgments about the relations holding between text spans. It is assumed however that the more coherent the writer has made the text, the clearer each choice of relation is for the text analyzer. 41
    • 2.5.2 The Need for Multiple Levels of Discourse Structure [MP92] provide an influential argument against [MT88]’s claim that though subject-matter and presentational relations can be distinguished, in general, there will be a single preferred relation holding between consecutive text spans. They argue that this distinction is in fact a conflation of the two levels of discourse interpretation identified by [GS86]: the informational level, and the intentional level, respectively, and that a complete model of discourse cannot depend on analyses in which these levels are in competition. They support this argument first by showing both that information can flow between these levels to produce the relations between text spans in a discourse, as illustrated with (2.61)-(2.62): (2.61) George Bush supports big business. (2.62) He’s sure to veto House Bill 1711. [MP92] argue that the presentational relation EVIDENCE is a plausible RST relation between the nucleus (2.62) and the satellite (2.61) (Table 2.11). Equally plausible, however, is the subjectmatter VOLITIONAL-CAUSE relation, where (2.62) is the nucleus and (2.61) the satellite. [MT88]’s definition of a Volitional-Cause relation is given in Table 2.12. Table 2.12: Volitional-Cause: RST Relation Definition relation name constraints on N constraints on S constraints on the N+S combination the effect VOLITIONAL-CAUSE N presents a volitional action or a situation that could have arisen from a volitional action. none S presents a situation that could have caused the agent of the volitional action in N to perform that action; without the presentation of S, R might not regard the action as motivated or know the particular motivation; N is more central to W’s purposes in putting forth the N-S combination than S is. R’s recognizes the situation presented in S as a cause for action of N 42
    • [MP92] argue that if the reader knows (and knows that the writer knows) that the bill places stringent controls on manufacturing, then s/he can conclude that (2.61) is evidence for (2.62), thus reasoning from information to intention. Alternatively, if the reader doesn’t know anything about the bill, but expects the writer to support the claim that Bush will veto it, then s/he can conclude that (2.61) is a cause of (2.62), thus reasoning from intention to information. [MP92] also show that spans can be related simultaneously on both levels, as in (2.63). At the informational level, a plausible RST analysis is that (2.63 c) is the nucleus, the writer’s main informational communication, (2.63 a) is a CONDITION on (2.63 b), and (2.63 a) and (2.63 b) together are a CONDITION on (2.63 c). This RST structure is shown first in Figure 2.10. Suppose however that the writer is planning a surprise party for the reader. Then at the intentional level, (2.63 a) is the nucleus, the action that the writer wishes the hearer to perform, (2.63 c) MOTIVATES (2.63 b), and together they MOTIVATE (2.63 a). This RST structure is shown second in Figure 2.10. (2.63 a) Come home by 5:00. (2.63 b) Then we can go to the hardware store before it closes. (2.63 c) That way we can finish the bookshelves tonight. Figure 2.10: RST Condition and Motivation Relations Because the intentional and informational structures for this discourse are not isomorphic, they cannot be produced simultaneously. 43
    • 2.5.3 “Elaboration” as Reference [KOOM01] show that the object-attribute elaboration relation over- and under- generates RST structures. [MT88] define the elaboration relation (of which the object-attribute type is subset 5) as shown in Table 2.13. Table 2.13: Elaboration: RST Relation Definition relation name constraints on N constraints on S constraints on the N+S combination the effect ELABORATION none none S presents additional detail about the situation or some element of the subject matter which is presented in N or inferentially accessible in N in one or more of the ways listed below. In the list, if N presents the first member of any pair, then S includes the second. 1. set : member 2. abstract : instance 3. whole : part 4. process : step 5. object : attribute 6. generalization : specific R’s recognizes the situation presented in S as providing additional detail for N. R identifies the element of subject matter for which detail is provided. In words, [MT88] define the object-attribute elaboration relation to hold between N and S if N ‘presents’ an object (e.g. contains a mention of it), and S subsequently present an attribute of this object. [KOOM01] have found this relation to be widely used in museum guidebooks. They illustrate the fact that it under-generates with (2.64 a)-(2.64 d): (2.64 a) In the women’s quarters the business of running the household took place. (2.64 b) Much of the furniture was made up of chests arranged vertically in matching pairs. (...) 44
    • (2.64 c) Female guests were entertained in these rooms, which often had beautifully crafted wooden toilet boxes with fold-away mirrors and sewing boxes, and folding screens, painted with birds and flowers. (2.64 d) Chests were used for the storage of clothes. . . In (2.64 b), the object chests are mentioned. Discussion of this object is taken up again in (2.64 d). The text is clearly coherent, but the desired RST analysis where (2.64 d) is a satellite of (2.64 b) using object-attribute elaboration is not possible, because (2.64 b) and (2.64 c) are both already satellites of (2.64 a) under an elaboration relation. [KOOM01] illustrate the fact that the object-attribute elaboration relation under-generates by comparing (2.65 a)-(2.65 b) with (2.66 a)-(2.66 c) : (2.65 a) Arts-and-Crafts jewels tend to be elaborate. (2.65 b) However, this jewel has a simple form. The discourse in (2.65a)-(2.65 b) displays a concession relation between the satellite, (2.65 b), and the nucleus, (2.65 a). However, when an object-attribute elaboration relation intervenes, as in (2.66 a)-(2.66 c) the discourse is incoherent, although RST allows the structure, with (2.66 c) now being the satellite for the complex unit (2.66 a)-(2.66 b), with (2.66 a) as the nucleus. (2.66 a) Arts-and-Crafts jewels tend to be elaborate. (2.66 b) They are often mass-produced. (2.66 c) However, this jewel has a simple form. If compositionality, continuous constituency, and a tree structure are maintained as RST assumptions, then RST undergenerates. [Sib92] has argued for relaxing the continuous constituency constraint (i.e. that S be adjacent to N, or adjacent to a satellite of N). [KKR91] has argued for relaxing the tree structure constraint (i.e. that each text span, except for the span constituting the entire text, be involved in exactly one schema application, with no over-lapping spans, and no spans not linked to other spans). [KOOM01] take another tack, arguing that the over- and under-generation problems arise because object-attribute elaboration is not a direct relation between propositions, but rather, a direct 45
    • (e.g. identity) relation between objects, and a spurious association between propositions. They argue that local and global focus mechanisms [WJP81, GS86] make the use of this relation redundant. They propose a revision of RST in which the object-attribute elaboration is removed, and high-level text spans are related not by RST relations, but by mechanisms of global focus such as informationally redundant utterances (IRUs)[Wal93], nominalization, and discourse deixis. In [KOOM01]’s model, a coherent text is a sequence of focus spaces, called entity-chains, that succeed each other in a legal manner, as exemplified in Figure 2.11. Figure 2.11: [KOOM01]’s Discourse Model In the figure, each entity-chain, labeled EC1, etc, has as its global focus an entity E, and consists of a sequence of RST trees, either atomic text spans, shown as small boxes, or complex structures of RST relations (minus the object-attribute elaboration relation), shown as triangles, in which the root nucleus of each tree is about E. In a legal sequence of entity-chains the focused entity in each previous propositions, where the value of ‡ ‡ chain has been mentioned somewhere in the is still an open question. They call this a resumption relation, indicated by directed arcs in the figure, and its felicitous use is claimed to be a function of its linear distance from any previous mention of it, rather than being a function of its relationship to the right frontier of a discourse structure tree. 2.5.4 Summary RST claims to be able to describe the majority of naturally occurring text, and has found wide application in the literature. For example, [Fox87] has demonstrated how human explanations of the choice between pronouns and full NPs in expository text can be derived from RST structures, [SdS90] define a set of heuristics for recognizing RST relations,[Mar97] has built an RST annotation 46
    • tool and manual in which users are instructed to select from a hierarchical version of the RST relations (e.g. they choose elaboration only if no other relation fits) and [Mar00] shows that an automated method of labeling RST-type relations can partially replicate human annotation, if “cue phrases” and punctuation are used as signals of the presence of particular RST relations. RST-type relations have also been used in combination with other intentional relations as planning operators in natural language generation systems (see [Hov93, MP93]). However, a number of important limitations inherent in the RST approach have also been raised. First, Moore and Pollack show that RST conflates intentions with tree structure. This has serious consequences. For example, an RST-based system such as [Mar99] enables one to automatically derive and enumerate all possible RST interpretations of a text, but does not provide a mechanism for choosing between them. Similarly, [Hov93]’s RST-based system allows for multi-relation definitions that assign two labels to consecutive discourse elements. Neither system, however, accommodates concurrent, non-isomorphic interpretations. This is a problem for all tree-based approaches; if both levels are treated structurally then multiple structures will always have to be considered. Second, [KOOM01] showed that the object-attribute elaboration relation both over- and undergenerates the space of discourse structures in an RST theory. Rather than revise the assumptions inherent in a tree structure, they propose that tree structures should hold only within “entity chains”, which correspond roughly to discourse segments associated with focus spaces in Grosz and Sidner’s theory. The difference is that Knott et. al do not constrain focus spaces as highly as a stack or a tree structure does, rather they propose a linear constraint on the number of intervening “entity chains” between reference to entities within prior entity chains. More generally, RST is simply not a complete model of discourse structure, in that it relies wholly on reader’s intuitions, says nothing about sub-clausal coherence, defines no formal mechanism for computing the relations between text spans, and provides no objective method of justifying their choice of relations over any other. Because they make use of so many more relations than any of the other relations we have seen, such justification is important; we need to propose some mechanism by which these relations are produced, especially if they intuitively seem capable of describing the majority of texts. 47
    • In the final section of this chapter, such a mechanism is proposed. In the next section, however, we present an alternative theory of discourse coherence, one which models every major characterization of discourse coherence that we have so far seen in this chapter without reference to a syntactic tree structure. 2.6 A Semantic Theory of Discourse Coherence SDRT [Ash93, LA93] is a dynamic semantics approach to discourse coherence. It has two main modules: a formal language for representing discourse context [Ash93], and a theory of discourse inference (DICE) [LA93] which is used to compute discourse relations. [LA99] further propose a separate module for limited reasoning about intentions and cognitive states. SDRT is also shown to provide an account of anaphora resolution; in particular, the semantics it proposes incorporates the semantic objects referred to by discourse deixis. We now return to our discussion of discourse deixis, in terms of these semantic objects; this discussion which will provide the background for our studies in Chapter 3. 2.6.1 Abstract Objects Since the earliest work in logic and linguistics, propositions have been viewed as the semantic interpretation of sentences (c.f. [Mon74, CQ52]); studies of adverbial modification and tense in formal linguistics have also made reference to eventuality interpretations (c.f. [Dav67, MS88]). Propositions, as well as states-of-affairs, properties, facts, causes, and effects, have no spatio-temporal location. For some eventualities too, it may be difficult to pinpoint precise spatio-temporal or sensory coordinates: the fall of the Roman empire, for example [Ven67]. These interpretations are abstract. Early work in natural language philosophy (c.f. [Ven67, Aus61, Str59]) provides discussion of the properties of these interpretations. The precise nature of these abstractions has nevertheless proven difficult to pin down, partly because they can be difficult to distinguish. For example, the sentence John went to the store can be interpreted as an event, it can be attributed a truth value, it can be viewed as a surprising fact, or it can be interpreted as a result of John’s needing clothes. We can view these interpretations as objects, 48
    • as evidenced by the fact that we refer to them using noun phrases, e.g. the fall of the Roman Empire. That these objects can be distinguished is evidenced by the fact that we can explicitly indicate which interpretation is being referred to, e.g. the fact that the Roman empire fell, the event of the Roman empire falling, etc. Vendler argues that we implicitly distinguish these interpretation by the way we nominalize them. Nominalization transforms a sentence into a noun phrase. The “essential ingredient” of a nominalization is a verb derivative. Vendler argues that nominalizations fall into perfect and imperfect classes, as in Table 2.14. Table 2.14: [Ven67]’s Imperfect and Perfect Nominalizations imperfect nominals (2.67) It is fortunate that John has arrived unexpectedly. (2.68)John’s having arrived unexpectedly surprised me. perfect nominals (2.69) The beautiful singing of the Marseilles took all afternoon. (2.70) The unexpected arrival went unobserved. As shown in (2.67), noun clause nominalizations allow the verb to take tense, auxiliary verbs and adverbs, but they cannot be modified by articles or adjectives (e.g. *the beautiful that John arrived). As shown in (2.68), -ing forms of verbs in nominalizations can take either tense, auxiliary verbs and adverbs, or articles and adjectives, as shown in (2.69), but not both at once (e.g. *the beautiful singing *unexpectedly). As shown in (2.70), nouns derived from verbs in nominalizations cannot take tense, auxiliary verbs or adverbs (e.g. the *unexpectedly *having arrival); they can only take articles or adjectives. Since tenses, auxiliaries and adverbs characterize verbs, and are generally incompatible with articles and adjectives in nominalizations, and are not permitted at all in perfect nominalizations, Vender concludes that in perfect nominalizations, the verb is dead as a verb, while in imperfect nominalizations, the verb is still alive as a verb. Vendler further distinguishes loose containers and narrow containers (i.e. the rest of the sentence), as in Table 2.15. Narrow containers describe spatio-temporal qualities of events. They permit only perfect nominalizations, in which, if there is a verb, it is dead. Narrow containers also 49
    • take nouns which can behave like perfect nominals, e.g. fires and blizzards, unlike tables and cows, can occur, begin, end, can be sudden or prolonged; they can be read as events. Table 2.15: [Ven67]’s Loose and Narrow Containers loose subject containers loose object containers narrow subject containers narrow object containers narrow PP containers is probable/certain/unlikely/surprising/a fact he denied/mentioned/remembered is slow/gradual/an event/a process/an action, occurs/begins I watched, I heard, I felt, I observed until/before/after/since Loose containers ascribe properties to facts. They permit both imperfect and perfect nominalizations. When they take an imperfect nominalization, the verb is alive; when they take a perfect nominalization, however, an alive verb is attributed to it, as in (2.71). When they take simple nouns, they are read as suppressed imperfect nominals,as in ((2.72). Moreover, if a sentence is not nominalized, it will be relativized by a relative clause in the form of a loose container (2.73). (2.71) John’s singing of the Marseilles surprised me. (read: that he sang the Marseillaise) (2.72) The abominable snowman is a fact. (read: the existence of the snowman) (2.73) John died, which surprised me. vs. *John died, which was slow. Vendler uses these distinctions to investigate the nature of other abstract objects, such as effects and results. Examples such as (2.74) - (2.75) indicate that effects describe perfect nominals, e.g. events, changes, or processes. An effect can reach a large area, can be felt, measured, registered, can be violent and dangerous. These examples also show that effects attribute events to other events, also denoted by a perfect nominal. Similarly, other members of the effect family of terms, product, work, creation, upshot, issue, outcome are predicated of events. (2.74) The moon’s position has an effect on the movement of the oceans. (2.75) *That the moon has its position has an effect on the ocean’s having movement. Results, on the other hand, like causes, consequences, reasons, motives, and explanations, can be stated, told, believed, probable or improbable, and sometimes fortunate or unfortunate, expected or unexpected, sad, disastrous, or horrible, i.e., describe fact interpretations (2.76). 50
    • (2.76) That the oceans have movement is a result of the moon having its position. These interpretations are difficult to distinguish; what emerges most clearly from this discussion is that many different abstract interpretations can be conveyed as the interpretations of sentences. Of course, nominalizations are not the way to convey these sentences, nor are single sentences the only carriers of these interpretations. Discourse deixis, as we have seen, provides another mechanism for conveying them. And discourse deixis can refer to not only to the interpretations of sentences but to a wide variety of syntactic constituents, including verbal predicates, as in (2.31, sequences of sentences, as in (2.33), and untensed clauses, as in (2.43), all of which are repeated below. (2.31) John [smiled]. He does that often. (2.33) [I woke up and brushed my teeth. I went downstairs and ate breakfast, and then I went to work.] That’s all I did today. (2.43) [ It talks about [ how to [ go about [ interviewing ]]] and that’s going to be important. What’s more, discourse deixis reference indicates that the range of abstract objects is even wider than Vendler addressed. For example, discourse deixis can also refer not only to speech act interpretations of sentences, as [Web91] noted (Table 2.9, shown below as 2.77), but as [DH95] have shown, to the discourse relation between sentences, such as the contrast relation in (2.50), repeated below, and even to a presupposed defeasible rule arising from a discourse relation between sentences (See [Kno96] and below for a discussion of these rules), such as in (2.78), where “if it’s raining the sun isn’t shining” is presupposed and denied. (2.77) Speaker A: John speaks loudly. Speaker B: Repeat that. (2.50) [If a white person drives this car it’s a “classic”. If I, a Mexican-American, drive it, it’s a “low-rider”.] That hurts my pride. (2.78) [The sun is shining although it’s pouring rain.] That’s a rule Bermuda always breaks. Essentially, there are as many abstract objects as there are abstract nouns, and they can be referred to implicitly via discourse deixis, or explicitly with a demonstrative inferable NP [Pri81], e.g. that fact. Therefore, in this thesis we will extend the term “abstract objects”12 to cover all of 12 It appears this term was coined by [Ash93]. 51
    • these possibilities. [Ash93] classifies the range of “saturated” abstract objects as shown in Figure 2.12. Also contained within these classification nodes, but subject to a slightly more complex linguistic analysis, are “unsaturated abstract objects”. This difference is akin to [DH95]’s “abstraction operation”: some member of the antecedent clause must be abstracted, replaced or otherwise altered to achieve the reference to an unsaturated abstract object. In (2.79), for example, we must alter the referent to the form “Mary should go out with an entity coindexed by the speaker”, and in (2.80) we must change the voice of the antecedent to “take the garbage out”: (2.79 [John said that Mary should go out with him], and Bill said that too. (2.80) [The garbage had to be taken out], so that’s what Bill did. Figure 2.12: [Ash93]’s Classification of Abstract Objects Asher’s classification organizes abstract objects along a scale of “concreteness”; eventualities are the most concrete and proposition-like objects are the least concrete abstract objects. Reminiscent of Vendler, Asher argues that eventualities behave most like concrete NP entities; they are located in space and time and can also be causal (2.81); in contrast, fact-like objects are not located in space and time but can be causal (2.82); finally, proposition-like objects are neither located in space or time, nor can they be causal (2.83). In these examples, we use the demonstrative inferable NP form to make clear the AO interpretation of the first clause. (2.81) The Jets scored. That event caused the Patriots’ coach to throw down his headphone. (2.82) The Jets scored. That fact caused the Patriots’ coach to throw down his headphone. 52
    • (2.83) The Jets scored. *That proposition caused the Patriots’ coach to throw down his headphone. The difference between reference to propositions and facts is slight but interpretable: reference to propositions predicates on a truth value, which appears to be the only property that can be attributed to a proposition. In contrast, reference to a fact takes its truth value for granted [Eck98]. 2.6.2 A Formal Language for Discourse In Section 2.3.5 we reviewed [Web91]’s use of referring functions, [Sto94]’s use of information states, and [DH95]’s set of operations, all of which were models of how abstract objects are created. [Ash93]’s model uses a formal language of discourse that is an extension of DRT [KR93]. DRT is a dynamic semantics, analyzing meaning in two steps. First, the DRS construction algorithm provides a set of rules for the incremental (sentence-by-sentence) construction of a semantic representation of a discourse, called a discourse representation structure (DRS). Second, the DRT correctness definition provides instructions for homomorphically embedding a DRS in a model to produce the truth conditions for a discourse. Note that while bottom-up DRS construction is largely compositional, there is no Montague notion of compositional semantics at the discourse level, where rules for semantic interpretation correspond to syntactic rules of construction. Essentially, DRS construction translates terminal nodes of the syntactic tree for each sentence in a discourse into a DRS, whose union then forms the DRS of the discourse, according to the instructions for correctness. A DRS has two parts: a universe, containing the relevant discourse entity references, and a condition set, containing n-place DRS predicates with discourse entity references as arguments. In DRS (a) shown in Figure (2.13), x and y are discourse entity references; the condition set says that x is a boy, y is Fred, and s kicks y. To get the DRS (b) for all of the discourse, DRS (a) serves as context; the discourse referents introduced by the second sentence are entered into the condition set and universe created in (a). As shown DRS (b) contains one incomplete condition (z = ?); anaphora resolution is required, to replace ? by a discourse referent other than z. Accessibility of a discourse referent x to a discourse referent y is a constraint which says roughly that x is accessible to y if it is in the universe of a DRS to the left, super-ordinate to, or the same as the universe in 53
    • which y is declared. After identifying ? with y, the correctness definition says that (b) is a proper embedding in a model M iff M contains Fred and Max such that Max kicked Fred and Fred cried. As shown in (c), sentential operators such as if...then create subDRSs, as do many determiners, negation, attitudes, belief operators, nominalizations, gerunds, complement clauses, etc. Figure 2.13: Sample DRSs Like [Sto94], Asher does not use coercion to create abstract object interpretations; like [Sto94], he assumes these interpretations are already present, in this case, as additional variables in DRSs. F To achieve reference to eventualities, Asher simply adds to an event variable (or a state variable ) to the DRS translation of verbs, which can be identified by anaphora resolution with the variable @ introduced by the discourse deictic anaphor. Event summation is then defined to permit reference to complex eventuality interpretations (e.g. 2.33). To achieve reference to proposition and fact interpretations, Asher simply allows the variable introduced by a discourse deictic anaphor to be identified directly with an accessible DRS (roughly, if the variables in the universe of a DRS are accessible to a discourse referent x, so is the DRS itself). As defined, however, DRT construction provides no account of discourse relations or discourse segmentation (or the ability of a discourse deictic to refer to a discourse segment). To build a DRS for a discourse as a whole, one simply adds the DRS constructed for each incoming sentence to the DRS one already had. Asher postulates an additional level of discourse interpretation to provide a semantic theory of discourse structure; the the resulting structure is called a segmented DRS 54
    • (SDRS). Essentially, an SDRS is a recursive structure of labeled (S)DRSs, with discourse relations between the labels, and a partial ordering on both the (S)DRSs and the discourse relations. The relations have a truth-conditional content concerning the relation of constituent (S)DRSs to each other, but not all relations have a truth-conditional impact on the content of what is said, and the discourse dominance hierarchy represented in SDRS also has no truth-conditional impact on the content of what is said. Note that while DRSs are among the basic constituents of an SDRS, and DRSs are typically viewed as corresponding to sentences, Asher leaves open the question of whether constituent DRSs can correspond to a clause or several sentences, etc, noting that purpose plays an important role in determining individual segments[GS86]. Each SDRS also contains a distinguished DRS, a discourse topic, which summarizes the content of a constituent in an SDRS and bears a particular structural relation to that constituent. In an SDRS, every new constituent (S)DRS must be attached to an antecedently constructed constituent. Only open or d-free constituents, however, are available candidates for attachment. The theory distinguishes topic-updating and SDRS updating with an incoming DRS, via distinguished discourse relations. Continuation and Elaboration relations are distinguished as topic-based; to attach to a constituent they require only that it be open. The current constituent is the constituent DRS containing the information from the previous sentence, and is always open. Also open is the SDRS in which the current constituent occurs, and to any SDRS that encloses that, etc. This corresponds roughly to the right frontier of the discourse syntactic tree, but it is more general, akin to [KOOM01]’s notion of “resumptive links”, because subordinate constituents are not treated as part of the attachment. to which the incoming constituent attached to be both open and d-free, where d-free means roughly that ¡ ¡ Non-topic based relations require the constituent is not contained within an SDRS whose topic subsumes . ¡ An example of a simple SDRS K is shown in Figure 2.14, consisting of a (sub)SDRS K1 and a DRS k specifying the discourse topic (indicated by the arrow) which summarizes these DRSs v and is in an Elaboration relation with K1. K1 consist of DRSs k1-3 and the hierarchical discourse 55
    • relations that hold between them. Figure 2.14: Sample SDRS More complex SDRSs result from relations between non-adjacent constituents. For example, in (2.84), k4 is related by the topic based Continuation relation to k1. (k1) (k2) I had salmon. (k3) I had tiramisu. (k4) (2.84) I ate a lovely dinner. Then I went for a walk. In this case, the SDRS K contains a discourse topic k0, a subSDRS K1, and an Elaboration relation between them. K1 contains a subSDRS K2, its a discourse topic k1, and an Elaboration relation between them. k1 is also in a Continuation relation with k4. K2 contains k2 and k3 and a Continuation relation between them. As this example shows, multiple discourse relations between the constituents of an SDRS are possible, so long as they are accessible. The process of inferring discourse relations is described below. 56
    • Retrieving Antecedents of Anaphora from the Discourse Structure depends on whether or the constituent in which › › in a constituent that bears a discourse relation to › ically linked to another discourse referent or discourse structure may be anaphoroccurs š In general, the effect of updating an SDRS is that whether a discourse referent š 2.6.3 occurs. Other constituents will not yield potential antecedents. can be resolved to , because k3 is in a € i can resolve to , because k0 is in an Elaboration relation with j k Continuation relation with k2, and € because k2 is in a Continuation relation with k1, and can be resolved to , j For example, in Figure 2.14, this availability constraint predicts that K1. But if the text corresponding to k1 were “Kathleen is taller than the people she works with”, and the text corresponding to k3 were “She teaches with them”, this constraint would correctly predict that them could not be resolved to the people she works with. However, based on examples like (2.85), Asher relaxes this constraint for Parallel and Contrast relations. (2.85) John does not believe that [Mary is treating him fairly]. But Fred is certain of it. In such cases, anaphoric reference is successfully made to an embedded DRS, (the complement › 2.6.4 that is to be the antecedents of discourse referents in . › related by these relations to another constituent š of believe); thus Asher allows discourse referents or structures embedded in a constituent A System for Inferring Discourse Relations In [HSAM93]’s approach to inferring discourse relations, discussed in Section 2.2, discourse relations are “proven” through logical abduction, where the total cost of a proof of each possible discourse relation is determined by the sum of the cost of abducing its premises, and that the cheapest proof wins. [Lag98] notes three problems with this method. First, only one relation will be chosen. Second, the costs of different discourse relations will be compared regardless of possible inconsistencies. Third, it is not clear how these costs should be determined. While [HSAM93] suggests that psycholinguistic experiments can determine relative costs, in reality the experimentation that would be required to establish these values seems no small feat. DICE [LA93] is an alternative discourse inference system, used for computing the discourse 57
    • relations in SDRSs. Like [Keh95], discourse relations are defined in terms of constraints, or rules, that must be satisfied, and like [HSAM93] a system of rule application is defined Unlike [HSAM93] however, DICE is driven by the process of determining inconsistencies. Multiple discourse relations may hold, so long as they are consistent with each other. Moreover, establishing discourse relations is represented as a consequence of linguistic, world, and lexical knowledge, whose interaction is explicitly defined, rather than via the association of costs to premises13 . The definitions of the discourse relations originally discussed in [LA93] are shown in Table 2.16; their definitions are given in [Lag98]’s simplified format (Narration is roughly the Continuation relation defined in the previous section). Table 2.16: DICE: discourse relation definitions ) Narration( ) Subtype( ) Elaboration( ) cause(e , e ) Result( ) cause(e , e ) Explanation( ) overlap(e , e ) Background( ) ‰ Ÿ¡ ) ) ‰ Ÿ¡ ) Š¡ ‰ t ‰ Š¡ t t t ¡     ¡ ¡   ‰ Ÿ¡ ‰) Ÿ¡ represents a defeasible implication (e.g. t ž ) t t ž t ž t ž t ž ‰ ¡‰  Š(iœ ) Š(iœ ‰ ¡‰  ) Š(iœ ‰ ¡‰  ) Š(iœ ‰ ¡‰  ) Š(iœ ‰ ¡‰  These relations are defined as defeasible rules, where ) Narration Elaboration Result Explanation Background normally, if...,then...) governing how each relation is inferred, and describing the knowledge that is needed to infer it. In each case this knowledge is defined with respect to an update function  ) is related, such that ž f) ‰ ¡‰  Ÿ(iœ ) The defeasible rule of Narration is the least demanding: every clause via Narration. The may be connected with predicate in the Elaboration relation requires that the )  in   ¡ € F €› k ¥g˜ f‡ ¡ a clause is updated with and . ) ¡ ¡ via a discourse relation between to which ) represents a clause in the discourse  : eventuality (state, event or process) in , e , be a subtype of that in , e , such that the information ¡ extends that in . In both Explanation and Result relations, these eventualities must be in a ¡ ) in causal relation; in the former, e is the cause, and in the latter, e is the cause. In Background   ¡ relations, these eventualities must display partial overlap in temporal order, and at least one will be a state. 13 It can be argued that representing these knowledge bases suffers from the same complexity as determining the plausibility of assumptions. See [Lag98] for discussion. 58
    • In DICE, the knowledge needed to defeasibly assume a discourse relation must interact with € F C@ jF ¢F E ‡ PA gqg™gG…9C certain axioms, shown in Table 2.17. Only those relations which are consistent with these axioms will be inferred. Each axiom expresses causal and temporal properties of eventualrepresents a temporal ordering on eventualities; the left argument must precede the right x e is understood to mean (e £ ¥  ¡ e e ), i.e. the left argument must ¡ £ ¤  x argument, except ˜„ …‡ £ ities. precede the right argument. Note that Causes precede Effects is an axiom on lexical knowledge, not discourse relations. Table 2.17: DICE: Indefeasible axioms x§u ¡   ‰ Ÿ¡ ) ‰ Ÿ¡ ‰   ) u Ÿ¡ ‰ Ÿ¡ ) u ) Ÿ¡ ‰ ) Narration( ) e e Elaboration( ) e e Result( ) e e Explanation( ) e e Background( ) overlap(e , e ) cause(e , e ) e e ¡ ¨  £ u £¨  §u x ¡ £ £¨  §u x ¡ ¦  £ ¡ ¡   ¡ Axiom on Narration Axiom on Elaboration Axiom on Result Axiom on Explanation Axiom on Background Causes precede Effects DICE represents the world and lexical knowledge needed to infer discourse relations in terms of defeasible rules (called laws). Some examples are shown in Table 2.18. Table 2.18: DICE: Defeasible laws on world knowledge fall(x, e ) push(y, x, e ) cause(e , e ) revolt(x, e ) pacified(x, e ) overlap(e , e ) ¡     ¡ x ©t ¡ t ¡ t   t   t Ÿ(iœ ‰ ¡‰  ž ‰ ¡‰  t f) Ÿ(iœ ž f) Push Causal Law Revolt Law These laws need not be stored in the lexicon; they may be derived anew at the moment two is updated with )  clauses are uttered and need to be associated with each other, i.e. when 14 . The idea is that if “x revolted” is uttered, followed by “x was pacified”, the defeasible result, based on lexical knowledge of revolt and pacify, is that the corresponding events did not overlap in time. This result is then available to the rules and axioms that produce discourse relations. In DICE there are also defeasible laws on discourse processes, which specify interactions of temporal, causal, and lexical phenomena, and are considered part of a reader’s linguistic knowledge. Some examples are shown in Table 2.19 . 14 See [LA93] for references concerning how these derivations are modeled. 59
    • and/or ) ¡ In States Overlap, the effect of a state in a discourse is described: if expresses a state, then the eventualities they express will overlap in time. Maintain Causal Trajectory concerns ) ª is updated with related to , and it is known from context that e    successions of relations: if caused e , then e can’t cause e . No Cause restricts the use of the discourse connective when with ¡ ) « ¡ respect to the direction of causal relations. Read “ when ”, the law prevents e from being the   ¡ cause of e . ¡ Table 2.19: DICE: Defeasible laws on discourse processes (state(e ) state(e )) overlap(e , e ) R( , ) cause(e , e ) cause(e , e ) cause(e , e ) ¡ ¡ «   t x ©t ¡ ¡   ¡   x t ­Ÿ| t ¡ ¬ '¬   ) ‰ ¡z ‡ F † ŸiŠ"exd ) t Ÿ(iœ ‰ ¡‰  ž ‰ ¡‰  t f) Ÿ(iœ ž f) States Overlap Maintain Causal Trajectory No Cause DICE deduction principles govern the interaction between the laws and axioms given in Tables defines what can be derived from the ¯®¬ 2.16- 2.19. These principles are shown in Table 2.20. defeasible laws and indefeasible axioms. The conditions (e.g. (A1)...(A3)) must all be satisfied to derive a result. Table 2.20: DICE: Deduction rules ) · ¸¯®¬ µ ©¯®¬ Not: (or µ fx ± fx q µ t ²°q µ ´´± x t q ¹ »º· x u ¹³—µ u · t ­°q µ t ¶´± ± u ³¦q ) q ± Nixon Diamond (not: ± ©¯®¬ Complex Penguin Principle (A1) (A2) (A3) (B1) (B2) (B3) (B4) (B5) (B6) (C1) (C2) (C3) (C4) ± t ²°q Defeasible Modus Ponens Defeasible Modus Ponens states essentially that a reader may apply any of the laws. For example, if a reader reads Max fell. John pushed him, and if s/he derives the Push Causal Law, then the 60
    • , , e )) is , and t   š j –‹• t ž f) ‰ ¡‰  Ÿ(iœ the result of that law (cause(e , e )) is , and so long as the reader does not know ± ±   ¡ did not fall), s/he may defeasibly derive q x push( š j –‹• ybp{ ‡ †„ ,e ) ¡ fall( q condition of that law (in this example (that Max (that there is a causal relation between the two clauses). Deriving the Push Causal Law requires information from the lexicon to support the causal relation between “to push” and “to fall”. Furthermore, the axiom Causes precede Effects axiom must not ± be violated by knowledge, for example, that the falling preceded the pushing. Once is derived, Defeasible Modus Ponens can be used again to derive a Explanation relation from the definition of Explanation relations given in Table 2.16. Defeasible Modus Ponens will also allow the derivation of a Narration relation; in fact, an Explanation relation logically entails a Narration relation, for the conditions for a Narration rela- ž f) ‰ ¡‰  Ÿ(iœ ) are a subset of the conditions for an Explanation relations are satisfied ( t ž f) ‰ ¡‰  Ÿ(iœ tion ( cause(e , e ) ).   ¡ However, the Axiom on Narration requires Max’s falling to precede John’s pushing him, while the Axiom on Elaboration (and the Causes precede Effects axiom) require these events to occur in the opposite order. Complex Penguin Principle resolves these inconsistencies, by stating that if there are conditions of a law or discourse relation that logically entail the conditions of another law or discourse relation, but other laws of discourse relations make the two relations inconsistent, the rule with the most specific conditions wins: in this case, Explanation, because (B1) is satisfied by the conditions on Explanation entailing the conditions on Narration, (B2) and (B3) are satisfied by the conditions on Narration defeasibly implying a Narration relation, and the conditions on Explanation defeasibly implying an Explanation relation, respectively. (B4) and (B5) are satisfied by a the Axiom on Narration indefeasibly implying one event ordering, and the Axiom on Explanation indefeasibly implying the negation of that event ordering. (B6) is satisfied because the conditions for an Explanation hold, and therefore an Explanation relation, but not a Narration relation, holds. Nixon Diamond represents incoherent discourses, by forbidding the assumption of a law or discourse relation if it results in a direct contradiction with another assumed law or discourse relation. For example, if a Result relation is inferred between the first two clauses in (2.86), then the Maintain Causal Trajectory law (C1) on discourse processes assumes the third clause cannot also be the 61
    • cause of the second clause, so if an Explanation relation (C2) is inferred between the second and third clauses, incoherence results. Assuming the conditions of both rules are satisfied (e.g. the tripping, falling, and pushing occurred), Nixon Diamond requires that causality (or lack of it) between Max falling and John pushing him cannot be established. (2.86) John tripped Max. Max fell. John had pushed him. Extending the Theory to Cognitive States DICE consists of a modal propositional language augmented with , defeasible implication, which t 2.6.5 computes discourse relations in an SDRS from the compositional semantics of its clauses. However, DICE does not have full access to the formal language of SDRSs, only to the form of the information it contains. The conditions (topic-hood and discourse relations) of an SDRS K are translated into   labels K .   ¡ predicates of the proposition variable ,where ¡ [LA99] note that if DICE did have full access to the language of SDRSs, discourse interpretation would be undecidable, because it computes discourse relations using reasoning about partial information in a modal propositional language, while the language of SDRSs is first order. To perform consistency checks over a first order language, DICE would also have to be first order, and this would make discourse interpretation go beyond what is recursively enumerable. The same reasoning leads them to claim that the logic used for computing beliefs and intentions must also be “shallower” than the logic that models their content; because one lacks direct access another person’s cognitive state, default reasoning is necessary and thus consistency checks are needed. t Like in DICE, [LA99] use a modal propositional language augments with to compute cog- nitive states. The propositional variables in this language are indexed to cognitive states and to discourse content; In DICE, the conditions of an SDRS K are translated into predicates of the   ¡ is indexed to . Then an interpretation in the language for computing ¡   € 62 are also the ones that make .B À¿ "g¾ € F C@ @C • E PA 5¼–Gj true. Modal operators B(believes) and I(intends) operate over   e€ only if the worlds assigned to   ¦½   € cognitive states is ¡ propositional variable labels K , and in the language for computing cognitive states, the   proposition variable ,where corresponds to S believing
    • . The authors assume that if S intends this content, he does B .   e€ ¾ x u   »Âe€ ¾ not already believe it is true, i.e. I   Á½ the content represented in the SDRS Defeasible principles can then account for links between cognitive states and the content of clauses in dialogue. For example, a principle of “Cooperativity” states that H will normally adopt S’s goals; if not H will normally indicate this to S, i.e.: ) t I I ¡ à ÄÉ x ¾ ÈÃ Ç t ¥¡ ¾ ¡ Æà ň¡ ¾ x t (I I ¡ Äà I As [GS86] note, the possible goals a speaker may have are infinite. [LA93] note however that certain types of speech acts have goals which one can compute; for example, the goal of a speaker saying (represented with :) a question (represented with ? ) is to know an answer (represented as ¡ ), i.e.: ( B | ºy¾ t ¾ ¾ Ç ) ) x t Ê¡ ) S:? I Similarly, Grice’s maxims15 such as “Be Sincere”, and actions to achieve goals, such as Pracnormally implies ¡ ) and believes that ) ¡ tical Syllogism (i.e. If S intends then S intends ), can be represented as defeasible principles, enabling the reasoning between cognitive states and the informational content of clauses. 2.6.6 Summary In this section, we presented an alternative theory of discourse coherence, one which models discourse coherence without invoking the notion of a discourse tree structure. The theory replaces syntactic structure of discourse with a structured semantics. Vendler’s characterization of abstract object reference was presented and extended, and its major divisions were shown capable of being modeled in SDRT as reference to substructures or event variables. The theory further models world knowledge and the inference of propositional relations in terms of indefeasible axioms and defeasible rules. Interaction of the structured semantics and inference system is regulated by relationspecific formal rules of construction, and the availability of anaphoric reference is dependent on 15 We will discuss Grice’s maxims in detail in Chapter 5. 63
    • relations and the structures they create. The theory further extends itself to the inference of cognitive states, thereby modeling the links between every major characterization of discourse coherence that we have so far seen in this chapter. 2.7 Discussion So far in this chapter, we have overviewed how a variety of discourse models characterize all or some part of discourse coherence, and have summarized what they do and don’t do in relation to other models. In this section, we discuss unresolved issues pertaining to the approach of these models to discourse anaphora and discourse relations, and present a new model of a rich intermediate level of discourse, which provides a means of resolving the issues posed, while leaving a number of open questions. In subsequent chapters we undertake the process of answering some of the remaining open questions. 2.7.1 Proliferation of Discourse Relations As [Keh95] notes, in 1748, [Hum48] discerned three basic connections that can exist between ideas: Resemblance, Cause or Effect, and Contiguity (in time or place). Since then, as we have seen in the prior sections, many alternative ways of categorizing the atomic set of discourse relations have been proposed. [Lon83] and [MT88], for example, each provide a unique characterization of the “deep” semantic relations between propositions that underlie the surface structure of text. [MT88]’s characterization,which is also intended to cover presentational, or intentional, relations, is flexible about the number and type of possible relations that exist. In both of these theories it is argued that morpho-syntactic cues do not reliably indicate these discourse relations. [HH76], in contrast, derives what he claims to be a complete, though different, set of semantic relations between propositions from the “surface cues” available in a text. [Mar92] combines the two approaches to derive yet another set of discourse relations, by claiming a “deep” relation exists at a place in the text if an explicit surface cue can be inserted there. The approach taken by these authors towards defining a useful set of discourse relations is different than the approach taken by the remaining authors surveyed in this chapter, in that the former aim to describe the possible discourse relations 64
    • that could exist in a text in terms of their own intuition. Clearly, their intuition displays a significant degree of variation. On the one hand, as [Keh95] notes, many of these variations can be viewed as terminological variants of each other. And as [Kno96] notes, some flexibility is desirable, at least until a particular set of relations is proven useful and possibly even to allow for language evolution. Moreover, certain relations may prove more useful in some domains than others, and each set is not in all cases claimed to cover exactly the same range of discourse coherence, or apply to the same constituents, as the others. [Kno96] provides a useful comparison of a number of different sets, including those overviewed in this chapter. On the other hand, as ([Keh95]) notes, the variation illustrates a common objection to this “laundry-list” approach to understanding discourse coherence: without an explanatory basis for producing and constraining the production of a particular set of atomic discourse relations, and a characterization of how more complex relations can be derived from them, it is impossible to objectively select one particular set over any other. For example, in such an approach, it would be entirely possible to claim the existence of a relation defined explicitly for (2.87). As ([Kno96]) defines it, such a relation could be called: inform accident and mention fruit. (2.87) John broke his arm. I like plums. ([Kno96, 35]) Thus, while the “laundry list” approach is useful for perceiving the range of ways in which constituents are related in coherent discourse, an approach that describes the mechanisms involved in deriving, constraining, and combining discourse relations potentially has the additional benefit of yielding a distinction between the different kinds of discourse relations, and constituents, that should be associated with each mechanism. This is the approach to modeling discourse coherence taken by the remaining authors discussed in this chapter; their association of discourse relations with mechanisms likely explains why these models make use of a significantly smaller number of discourse relations when describing the mechanisms associated with them than do the purely descriptive models. Nevertheless, there continues to be considerable variation in the number and type of discourse relations these discourse models define as well. As we saw in prior sections, [Hob90] associates four propositional relations with 65
    • a process of discourse inference, [Keh95] reduces this set to three, and defines them in terms of constraints which [HSAM93]’s inference system determines satisfied or not. [SSN93] focus on the cognitive resources underlying the production of intentional and propositional relations, identifying them with four underlying features. [GS86] allows for an infinite number of intentional relations between discourse segments, while distinguishing only two structural relations, and two propositional relations between information conveyed by segments. [Pol96] distinguishes three structural relations between discourse units, and an unspecified number of propositional relations. [LA93] associates five propositional relations with a rule-based logic for discourse inference, and an infinite number of intentions with a rule-based logic for inferring cognitive states. More generally, [Hov90] surveys over 350 different discourse relations that have been proposed in the literature. 2.7.2 Use of Linguistic Cues as Signals The argument against using explicit “cue phrases” as signals of discourse relations between discourse units (generally defined as clauses and sequences of clauses), has two parts. First, a single cue phrase may not unambiguously signal a single discourse relation. For example, the coordinating conjunction, and, notoriously places very few constraints on the discourse relation that can be derived between the clauses it connects. Second, the presence of these cue phrases is not obligatory; discourse relations can be derived in an appropriate context when the associated cue phrase is not present. [Mar92]’s insertion test provides a resolution for the second argument, but not the former. Nevertheless, many cue phrases are so clearly associated with discourse relations (e.g. “as a result” Result relation) that, as we saw, most of the mechanism-based models make use of or at least u acknowledge this association. For example, even though RST’s authors argue against the use of cue phrases as signals of RST relations, extensions of RST[Mar00, Mar97, SdS90] have shown that linguistic cues can be used to manually and automatically label RST relations. There is in fact a wealth of literature concerning the different ways cue phrases can be used to signal both propositional and intentional relations. [Coh84], for example, argues that cue phrases can function to reduce the processing load on the hearer and facilitate recognition of the argument structure of a discourse. [EM90] identifies and associates pragmatic features with a variety 66
    • of cue phrases and shows that these features can be used to implement the use of cue phrases in an argumentation-based text generation system. [Hov95] argues that cue phrases can signal in parallel semantic, goal-oriented, attentional and rhetorical discourse relations, all of which can potentially yield distinct structural analyses; the speaker must select those which minimize the overall structural ambiguity for the hearer. [Bla87] argues that cue phrases indicate how the relevance [SW86] of one proposition is dependent on the interpretation of another. In [Sch87]’s multi-dimensional discourse model, cue phrases index each utterance to the speaker and hearer along a variety of pragmatic and semantic planes in the surrounding discourse. [Fra88, Red90, vD79] characterize how a variety of cue phrases signal pragmatic and propositional relations. [MS88, LO93] provide a detailed investigation of the formal semantics of temporal cue phrases, and [Lag98] provides a detailed investigation of the formal semantics of causal cue phrases. Building on these semantic investigations, [Kno96] develops a theory of coherence relations which provides both a solution to the problem that cue phrases aren’t reliable because they can be ambiguous, and a solution to the problem of the proliferation of discourse relations. Knott first defines an intuitive test to isolate a corpus of cue phrases. Human annotators use the test, which isolates every phrase that modifies a clause or sentence in a naturally-occurring text, together with its host clause. If the humans judge that the isolated unit cannot be interpreted without further context, but can be interpreted if the selected phrase is removed, then the phrase is a cue phrase (discourse connective). Knott’s corpus contains coordinating and subordinating conjunctions, and a wide variety of prepositional and adverbial phrases, in addition to a small number of relative clause markers and other phrases that take sentential complements. Drawing on [Mar92]’s insertion test, Knott then uses a substitution test to organize the corpus into a hierarchical taxonomy. The test selects contexts in which a cue naturally occurs, and a reader decides if that cue could be replaced with another cue in those contexts without changing the meaning of the discourse. By testing a variety of cue phrases in a variety of contexts, a hierarchical š taxonomy is formed, in which any cue can be characterized as synonymous with, a hypernym of, a hyponym of, exclusive with, or contingently inter-substitutable with any other cue . › By investigating the inter-substitutability contexts and drawing [SSN93]’s cognitive approach to 67
    • discourse relations and the semantics of cue phrases mentioned above, Knott defines eight binaryvalued cognitive features that characterize the possible discourse relations and associates the lexical semantics of cue phrases with values for these features, by interpreting the taxonomy e.g. as follows: if two cues are synonymous, then they share the same values of all features; if two cues are exclusive, š š š š which in all features for which › the value of is a hypernym of , then › then for at least one feature they take opposite values; if a cue shares is defined, but additionally has a value in a feature for is not defined, etc. Feature values for three connectives are illustrated in Table 2.21. Table 2.21: [Kno96]’s Features of Discourse Connectives Feature Source of Coherence Anchor of Relation Pattern of Instantiation Focus of Polarity Polarity of Relation Presuppositionality Modal Status Rule Type as a result semantic cause bilateral counterpart positive non actual causal Discourse Connective unfortunately furthermore semantic pragmatic result unilateral unilateral negative non actual causal positive non actual ind? then – positive – An undefined feature value is represented as “–”; an uncertain value is represented with a blank cell, and a “?” indicates that the value is likely but requires further study. The “source of coherence” feature represents whether the semantics of the connective asserts that the reader is intended to believe that the relation holds (semantic) or world knowledge already indicates that the relation holds (pragmatic). The remaining features relate to Knott’s argument that every discourse relation between two discourse spans A and B corresponds to the filling out of one of two types of defeasible rules, causal or inductive (i.e. generalizations), specified by the “rule type” feature. These rules are t ... Pn u t of the form: P1 Q. The remaining features specify how these rules are filled out, i.e. which span (A or B) yields P and how, and which span yields Q and how. For example, the “anchor” features represents the fact that A corresponds either to some P (cause) and is known, or Q (result) and is desired. The “pattern of instantiation”, “focus of polarity” and “polarity” features roughly indicate whether C or its negation is on the same or opposite side of the rule as A or its 68
    • negation. The “presuppositionality” feature indicates whether prior context is also involved in the relation as some P, and the “modal status” feature indicates whether this context is known (actual) or not (hypothetical) by the reader. Structural and Anaphoric Cue Phrases   2.7.3   DLTAG [FMP 01, CFM 02, WJSK03, WKJ99, WJSK99, WJ98] proposes a discourse model that will provide the foundation for the studies in the remainder of this thesis. The DLTAG model incorporates many of the insights of the models and investigations discussed above, but it also displays some significant novel insights that enable the incorporation into the model of viable solutions to both the proliferation of discourse relations and the ambiguity of cue phrases, and the incorporation of solutions to other problems not previously addressed by other models. At the core of DLTAG is the insight that discourse connectives can be modeled as predicates, akin to verbs at the clause level, except that they can take clauses as their arguments. DLTAG   [FMP 01] currently models this syntax using the structures and structure-building operations of a lexicalized tree-adjoining grammar (LTAG )[JVS99], which itself is an extension of TAG [Jos87], and is widely used to model the syntax of sentences. We will present the LTAG and DLTAG models in detail in Chapter 4, and a syntax-semantic interface for DLTAG will be discussed, drawing on the discourse-level interface presented in [Gar97b] and the sentence level interface presented in [KJ99]. Briefly, in a lexicalized TAG, there are two types of elementary trees: initial trees that encode basic predicate-argument relations, and auxiliary trees that encode recursion. Each elementary tree has at least one anchor: the lexical item(s) with which it is associated. A lexicalized TAG provides two structure building operations to create complex trees: substitution (indicated by ) and Ë adjunction (indicated by ). Ì In DLTAG, the anchor for an elementary tree may be a cue phrase or a feature structure that is lexically null, in which case an inferred relation may be represented in terms of [Kno96]’s features. DLTAG distinguishes three types of elementary tree structures, exemplified in Figure 2.15 with cue phrases as anchors. As the figure exemplifies, subordinating conjunctions (e.g. because), coordi- 69
    • nating conjunctions (e.g. and)16 , and lexically null feature structures are modeled as “structural connectives”, i.e. predicates that retrieve both arguments structurally. The semantics associated with these cue phrases and their arguments can be computed compositionally. as a result Ë Ì Ë ÍÍ Í ÎÎ Í Î Î Ë Figure 2.15: Elementary DLTAG Trees Ì Ï Ï •• • – –– and Ð Ð because Adverbials (e.g. as a result), on the other hand, are modeled as “anaphoric connectives”, i.e. predicates that retrieve only one argument structurally, the discourse unit they modify. The other argument must be retrieved anaphorically. The semantics associated with these cue phrases and their arguments cannot be fully computed until the anaphoric argument is resolved. To attach to the growing discourse tree, they must adjoin to the right argument of a structural connective. If no overt structural cue phrase is present, the structural argument of the adverbial is attached to the discourse structure via a lexically-empty elementary tree structurally identical to the tree for and in Figure 2.15, which conveys continuation of the description of the larger tree to which it is attached. Although a more specific relation may be inferred and represented as features in the tree, the relation provided by the syntax alone is semantically under-specified, analogous to the semantics of noun-noun compounds. DLTAG’s distinction between structural and anaphoric connectives is based on considerations of computational economy and behavioral evidence such as found, for example, in the case of multiple   connectives ([WKJ99]). In (2.88), taken (in simplified form) from [CFM 02], because encodes a causal relation between two eventualities, Q = R AISE I RE (S ALLY, F RIENDS ) and R = E NJOYS (S ALLY, C HEESEBURGER ), and nevertheless encodes a violated expectation relation between R = E NJOYS (S ALLY, C HEESEBURGER ) and P = S UBSCRIBES (S ALLY, V EGETARIAN T IMES ). (2.88a) Sally subscribes to Vegetarian Times. (2.88b) Lately, she’s raised the ire of her vegan friends (2.88c) because she nevertheless enjoys the occasional bacon cheeseburger. 16 This view will be further qualified in Chapter 4. 70
    • To model both predicates structurally would create a directed acyclic graph, which goes beyond the computational power of LTAG and creates a completely unconstrained model of discourse structure [WJSK03]. However, preliminary investigations into the behavior of cue phrases reveal that while subordinating and coordinating conjunctions seem to always and only take adjacent discourse segments as their arguments, adverbial cue phrases seem to share many properties with anaphora. For example, their left argument may be found intra- or inter-sententially, as shown in (2.89) and (2.90), respectively. Moreover, their left argument can be an inferred situation, as shown in (2.91) (the situation in which the addressee does not want an apple), and it can even be derived from the interpretation of a discourse relation between two segments, as shown in (2.92) (the result relation imparted by so). (2.89) A person who seeks adventure might, for example, try skydiving. [[WJSK03]]   (2.90) Some people seek adventure. For example, they might try skydiving. [[CFM 02]] (2.91) Do you want an apple? Otherwise, you can have a pear. [[WJSK03]] (2.92) John just broke his arm. So, for example, he can’t cycle to work now. [[WJSK03]] DLTAG does not claim to provide a complete model of discourse; it is not committed to the view of the entire discourse as a single tree, tree structures may only be built within segments (akin perhaps to [KOOM01]), and multiple trees may be possible, for example at an informational and intentional level, as [MP92] have shown. While DLTAG is not committed to a particular view of discourse structure, it is very committed to the idea that a rich intermediate level between highlevel discourse structure and clause structure, namely, the syntax and semantics associated with cue phrases, must be specified and recoverable in order to interpret a discourse. Moreover, because cue phrases are some of the clearest indicators of discourse structure, and their arguments can be   reliably annotated [CFM 02], large-scale annotation studies will provide information about the range of possible discourse structures. 2.7.4 Comparison of DLTAG and Other Models Unlike the discourse models discussed in prior sections, DLTAG does not claim to provide a complete model of discourse. Rather, it proposes an intermediate level of discourse structure and inter71
    • pretation that can be built directly on top of the clause structure and interpretation. Because DLTAG builds a tree structure for discourse, it is similar in computational power to the other tree-based discourse grammars discussed in this chapter. But although DLTAG, like many other models, argues that (an intermediate level of) discourse can be modeled in terms of syntax and semantics, DLTAG alone defines this discourse structure and interpretation in terms of the same mechanisms that are already used at the sentence level. Moreover, DLTAG alone argues that not all relations can be modeled structurally in a tree-based discourse model; DLTAG decouples anaphoric and structural connections between discourse segments, and thus only the DLTAG tree is able to model relations between a discourse segment and multiple prior segments within a single tree. Other tree-based models also introduce unnecessary redundancy in terms of the additional mechanisms they propose to build discourse. These models make use of a predefined set of discourse relations, and cue phrases are treated as “signals” of these predefined relations. The dependency of these relation definitions on the presence or inference of a cue phrase is clearly visible, such as the “otherwise” relation proposed in RST [MT88]. Numerous other examples of this are found in [Mar97]’s instructions for manually labeling RST relations. However, if, as in DLTAG, the syntax and semantics of cue phrases are taken into account, then it is redundant to postulate additional relation definitions and grammar or semantic rules to create discourse relations. Moreover, the RST manual also illustrates cases where discourse relations between discourse segments are redundant even in the absence of a cue phrase. As illustration, in (2.93)-(2.95) (from [Mar97]), a discourse deictic (italicized) in (b) refers to (a), and its predication (bold-faced) is synonymous with the RST (nucleus) and (satellite)). @ ‡ relation (capitalized) between (a) and (b) (order is represented by s INTERPRETATION n (2.93 a) All evidence points to the fact that Kennedy was assassinated by the CIA. (2.93 b) This suggests to me that the organization is untrustworthy. n EXPLANATION-ARGUMENTATIVE s (2.94 a) Most of the dinosaurs died about 65,000,000 years ago. (2.94 b) Some researchers assume that the impact of a big meteorite caused this. 72
    • n EVALUATION s (2.95 a) Features like our uniquely sealed jacket and protective hub ring make our discs last longer. And our soft inner liner cleans the ultra-smooth disc surface while in use. (2.95 b) It all adds up to better performance and reliability. [KOOM01] suggest that discourse deixis use is not a discourse relation, but rather a form of “resumption”. However, their data only contains cases concerning the RST elaboration objectattribute relation. In the following examples from [Mar97], we find similar redundancy, although there is no discourse deixis, but rather a nominal reference in (a) to a noun in (b). Again, the predication and/or reference is synonymous with the labeled RST relation. n/s CAUSE-RESULT n/s17 (2.96 a) Unfortunately for the athlete, the anaerobic metabolism of carbohydrates can yield a buildup of lactic acid, which accumulates in the muscles within two minutes. (2.96 b) Lactic acid and associated hydrogen ions cause burning muscle pain. n INTERPRETATION s (2.97 a) Steep declines in capital spending commitments and building permits pushed the leading composite down for the fifth time this month. (2.97 b) Such a decline is unusual at this stage in an expansion. n EVALUATION s (2.98 a) Policy makers have four options ... (2.98 b) The last of these is ultimately the only sustainable option. ELAB n PROC-STEP s (2.99 a) A user should invoke the program with the name of the file and the name of the file to be created. 17 [Mar97, 24] states that the writer’s intentions are unclear as to whether this cause or this result is the nucleus. 73
    • (2.99 b) The process opens the existing file, creates the new file, and creates a child process. What the RST relations in these examples do is simply restate the clause-level semantics, rather than describe new information about the link between the clauses that is not already represented in the semantics of the clause itself. Because most of the models discussed above do not incorporate clause level semantics into their discourse interpretation, they are likely to postulate what is in fact a redundant “discourse relation” in such cases. But because DLTAG builds discourse structure directly on top of clause structure, this semantics is available to the discourse interpretation; in DLTAG the “relations” between (a) and (b) in the above cases would be represented by an empty connective, signaling “continuation” of the discourse. By retaining the semantic interpretations of the clauses and the information about how the referential links between them are resolved, the need to supply an additional RST-type relation is removed. The models of inference discussed earlier in this chapter already make some use of the semantic contributions of some cue phrases. [HSAM93], for example, mention using the propositional content of because, and [GS86] mention using the propositional content of but. SDRT [LA93] also incorporates the semantic contributions of some cue phrases into their structured semantics and inference system, and does allow multiple relations between discourse segments, but its use of only structural attachments and predefined discourse relations again produces redundancy. For example, the association of a structural “result relation” with an adverbial cue phrase such as as a result will be shown in Chapter 3 to be redundant. In fact, as will become apparent after the discussion in Chapter 3, SDRT can already handle the semantics of adverbial cue phrases by extending to them the semantics they employ for discourse deixis. More generally, an incorporation of the semantics of all discourse connectives would likely reduce the complexity of inference systems; if the semantics of a connective can assert a relation between clauses, this relation no longer needs to be inferred. [Kno96]’s proposal to decompose discourse relations into features that are attributed to the semantics of cue phrases is model-independent and thus can be incorporated into DLTAG. However, there are a number of problems mainly resulting from his use of an intuitive test to isolate cue phrases which must first be resolved. For example, Knott’s list includes lexical items from all five syntactic categories originally noted by [Qui72] as containing cue phrases: coordinating and 74
    • subordinating conjunctions, adverb and prepositional phrases, and phrases which take sentential complements (e.g. it follows that, all told). If the interpretation of verbs and small clauses are available to the discourse model, as is the case in DLTAG, then attributing them additional features of a discourse relation becomes redundant; any presuppositions of these verbs (e.g. that if x follows, š then follows something) will already be available at clause level. Similarly, Knott’s intuitive test overlooks the fact that the semantics of complex cue phrases can be treated compositionally. For example, his list includes complex relative clause complementizers (e.g. especially which), as well as complex subordinating conjunctions (e.g. especially if/when, even if/when, only if/when, etc). Adverbs such as especially, even, only can attach to many cue phrases; listing each resulting combination individually is unnecessary if the compositional semantics of cue phrases is taken into account. Moreover, like verbs and small clauses, if the semantics of complementizers is available to the discourse model, then attributing them features of a discourse relation is redundant. More generally, Knott’s intuitive test enables other mechanisms of discourse coherence, such as inference, implicature, and intonation, to be conflated with the semantics of cue phrases, causing errors of comission and omission. For example, Knott erroneously includes unfortunately, and surprisingly, but not unhappily or not surprisingly. Moreover, investigation of naturally occurring cases of unfortunately reveals that some feature values are incorrect. Inclusion of unfortunately as a cue phrase would require it to be undefined for the majority of features, akin to and. While and is clearly uninterpretable in isolation with its host clause, unfortunately is not. In addition, Knott’s features cannot fully describe the idiosyncratic meaning of each connective. [JR98], for example, argue that the features incorrectly describe the use of donc (similar to therefore) in French, because further semantic properties of the adverb must be taken into account. Finally, it is not clear whether Knott intends to associate inferred defeasible rules with all cue phrases; although the value of the ‘rule type’ feature is left blank for adverbial temporal connectives (e.g. then, next), they need not be defined in relation to feature values that fill out the semantics of defeasible rules. In essence, DLTAG argues that it is the semantics of cue phrases which drives the construction of a discourse meaning. Associating cue phrases with semantics solves both the proliferation of discourse relations and the ambiguity of cue phrases as “signals” of these relations. It is no longer 75
    • necessary to “map” the use of cue phrases to separately defined discourse relations; instead, cue phrases supply relations in their semantics, and like all predicates, the meaning of the relation they supply can be vague or precise relative to other predicates, and it can also be over-loaded. For example, the verb go at the clause level has a variety of meanings, most of which can be stated more precisely by other action verbs. And because DLTAG views inference as another mechanism separate from compositional semantics, additional relations can be inferred whether or not a cue phrase is used. Incorporating the intermediate module DLTAG proposes into other modules will yield a more computationally economical and more observationally valid complete discourse model. 2.7.5 Remaining Questions Because the semantics of cue phrases drives the construction of discourse meaning in DLTAG, each cue phrase must be associated with a semantics. While, as Section 2.7.2 indicates, the semantics of subordinating and coordinating conjunctions has been investigated in great detail both at the clause and discourse level, the semantics of most adverbial cue phrases, consisting largely of adverb and prepositional cue phrases, have been mainly ignored at the clause level, and have received much less attention at the discourse level. We will see clear evidence of this in Chapter 3. In fact, part of this “attention inequality” is simply due to the fact that the sets of subordinating and coordinating conjunctions are relatively small, while the set of adverbials is a large set. In fact, as [Kno96] notes, the set of adverbials is compositional, and therefore infinite. For example, the adverb generally can be modified by innumerably many instances of very (e.g very generally, very very generally,...), each time producing a unique member of this set. It is not surprising, therefore, that while DLTAG proposes that certain adverbials function as discourse connectives, they do not isolate this subset from the set of all adverbials. Because it is not possible to answer the question of which adverbials which function as discourse connectives with a list, one must ask instead what mechanisms cause an adverbial to function as a discourse connective. In Chapter 3 we will use linguistic theory to investigate the semantic mechanisms that cause an adverbial to function as a discourse connective. This investigation will also shed light on the question of what discourse units an adverbial discourse connective relates. For while the discourse 76
    • models overviewed in this chapter make frequent use of eventuality interpretations of clauses, and frequent use of the term “discourse segment”, these interpretations and constituents are in most cases not well defined or distinguished from other interpretations or constituents. We will use discourse deixis research to better clarify both the semantic nature of the arguments of adverbial discourse connectives and the syntactic constituents from which they can be drawn. Our investigation will also shed light on the space of discourse relations imparted by a wide variety of adverbial discourse connectives, and enable a precise semantic representation of their behavioral anaphoricity. In Chapter 4 we then investigate how this semantics can be incorporated into a syntax-semantic interface for the DLTAG model. In Chapter 5, we investigate other ways in which the interpretation of an adverbial can contribute to discourse coherence. 2.8 Conclusion The discourse models presented in this chapter generally agree that discourse has a recursive structure and that this structure affects the interpretation of discourse. We have seen a number of efforts to formalize these insights, including descriptive, inference-based, syntactic and/or semantic approaches to modeling discourse relations and anaphoric constraints. We have argued that the DLTAG approach should be incorporated into these models, thereby removing the need to select a single set of “primitive” relations underlying all coherent text spans, and a single mechanism for producing them. In DLTAG’s view, the syntax and semantics of cue phrases provides one way of producing coherence between discourse units, and discourse inference provides another. By understanding how different modules interact with each other and with other characteristics of discourse to produce coherence, we can then begin to understand how complete and coherent discourse interpretations are produced. 77
    • Chapter 3 Semantic Mechanisms in Adverbials 3.1 Introduction In Chapter 2, we described similarities and differences between a variety of models of discourse coherence, which, taken together, distinguish different modules required to build a complete inter-     pretation of discourse. We introduced DLTAG ([FMP 01, CFM 02, WJSK03, WKJ99, WJSK99, WJ98]) as a theory that bridges the gap between clause and discourse interpretations, by using the same syntactic and semantic mechanisms that build the clause interpretation to build an intermediate level of discourse interpretation on top of the clause interpretation. In DLTAG, cue phrases, or discourse connectives, are predicates, like verbs, except they can take interpretations of clauses as arguments. For coordinating and subordinating conjunctions, both arguments come structurally. For adverbial cue phrases, which are mainly adverb (ADVP) and prepositional (PP) phrases, only one argument comes structurally. Based on consideration of computational economy and behavioral evidence, DLTAG argues that the other argument of these adverbials must be resolved anaphorically. However, while DLTAG proposes that certain adverbials function as discourse connectives, it does not isolate this subset from the set of all adverbials. Because the set of adverbials is compositional, and therefore infinite ([Kno96]), it is not possible list the adverbials that function as discourse connectives. In this chapter, we present a corpus-based investigation of the semantic mechanisms that cause certain adverbials, which we call discourse 78
    • adverbials, to function as discourse connectives, while other adverbials, which we call clausal adverbials, do not function as discourse connectives. In Section 3.2 we review the function and structure of adverbials and describe how these properties were used to extract a data set from a parsed corpora. In Sections 3.3-3.4 we overview major syntactic and semantic issues that have been addressed in clause-level analyses of adverbials, and introduce an extension to these analyses that incorporates the discourse deixis research introduced in Chapter 2 and defines the properties of semantic objects whose anaphoricity can cause adverbials to function as discourse connectives. Following DLTAG, in this extension, anaphoricity is defined as the use of “discourse connecting” devices such as anaphoric reference and presupposition by adverb and prepositional forms to retrieve objects at the discourse level, just as they are employed for the retrieval of objects at the clause level. In Sections 3.5-3.6, we incorporate into this extension discourse-level analyses that have already been proposed for a small variety of adverbial discourse connectives and present a range of semantic objects that compose adverbials and a range of anaphoric devices that determine if and how these objects relate to the surrounding discourse. We conclude in Section 3.7. 3.2 Linguistic Background and Data Collection This section provides the linguistic background upon which the studies in subsequent sections are built. We first discuss ways in which adverbial function differs from other functions of ADVP and PP and review the grammatical structure of ADVP and PP adverbials. We then describe how these properties were used to extract the data set studied in this thesis from a corpus of natural language. 3.2.1 Function of Adverbials Adverbials are adjuncts, elements whose presence in a clause is not obligatory for clause interpretation. They function as modifiers, elements that supply additional information about the element they modify. Adverbials frequently modify verb phrases (VP) or sentences (S). The term adverbial denotes a functional rather than a syntactic category. As exemplified in (3.1)-(3.8), a variety of syntactic categories can function as adverbials: in addition to ADVP and PP 79
    • (3.1)-(3.2), which are the focus of this thesis, finite clauses (3.3), non-finite clauses (e.g. infinitives (3.4), -ing participles (3.5), -ed participles (3.6)), verb-less clauses (3.7), and noun phrases (3.8) can also function as adverbials ([Ale97]). (3.1) Hopefully, there will never be another world war. (3.2) In most situations, John remains calm. (3.3) John worked late although he was very tired. (3.4) John plays to win. (3.5) Reading, John relaxes. (3.6) Urged by his mother, John did the dishes. (3.7) John ran into the street, unaware of the danger. (3.8) John came Tuesday. ADVP and PP do not always function as adverbials; for example, ditransitive verbs, as shown in (3.9), as well as verbs of behavior, movement, and situation, as shown in (3.10) - (3.12), may lexically sub-categorize for an ADVP or PP ([MG82]). (3.9) He gave a car to Mary. (3.10) He behaved awfully/in an unexpected way. (3.11) He resides nearby/at my house. (3.12) He dresses well/in slacks. Moreover, as shown in (3.13)-(3.18), some ADVP and PP (italics) can be used to modify phrasal categories (bold-face) other than VP (3.13) and S (3.14), including noun phrases (NP) (3.15), adjective phrases (AP) (3.16), and other PPs (3.17) and ADVPs (3.18) ([ODA93]). (3.13) They worked quickly/in a frenzy. (3.14) Probably/In all likelihood, he will survive. (3.15) Even dogs in captivity eat bones. (3.16) She is completely crazy about him. (3.17) He sent flowers right/over to his enemies. (3.18) At least once, he fell very seriously in love. 80
    • It is also important to note that ADVP and PP don’t always adjoin to the element being modified. Although every adverbial may not be licensed in every position, four positions within the clause are available to S- and VP-modifying ADVP and PP adverbials ([Ale97]), as shown in (3.19). As [Ale97] discusses in more detail, each of these positions may also have a parenthetical counterpart, in which the adverbial is set off by commas (or pauses) from the rest of the clause. Again, however, not all adverbials are licensed in every parenthetical position ([Ale97]). As discussed below and in subsequent chapters, our data set includes both S-initial adverbials that are comma-delimited and S-initial adverbials that are not comma-delimited, and we do not distinguish them semantically1 . S-initial: John definitely was hurt. John was probably hurt. S-final (after main verb): 3.2.2 S-medial (before auxiliary): S-medial (after auxiliary, before main verb): (3.19) Of course John was hurt. John was hurt slightly. Structure of PP and ADVP Generally in linguistics, the grammatical structure of a phrase is represented with a phrase structure rule2 . Such a rule is shown in (3.20), where XP represents a phrasal category, and X represents the minimal element, or head, around which the rest of the phrase is built. The arrow reads as “consists of (in the order shown)”, and parentheses indicate optional elements: SPEC abbreviates specifier, which is defined as an additional phrase used to make the meaning of the head more precise, and COMP abbreviates complement, (or internal argument), which is defined as an additional phrase used to supply information that is already implied by the meaning of the head ([ODA93]). u (3.20) XP (SPEC) X (COMP) Prepositions (P) in English, which correspond the head of a PP, are a closed class of lexical items. Some examples of prepositions are shown in (3.21). (3.21) about, after, as, at, by, before, down, for, from, in, of, on, over, since, until,... 2 See [FMP 01] for further discussion of DLTAG’s extraction of S-internal adverbial discourse connectives. See e.g. [ODA93] for further details Ñ 1 81
    • Adverbs (ADV), which correspond the head of a ADVP, are generally classified into two morphological types: non-derived, or unsuffixed, and derived, including suffixed and compound forms ([Ale97]). In English, the -ly suffix predominates3 ; it can be affixed to most adjectives and to some nouns (in the latter case the resulting form can often be used as both an adverb and an adjective). There are other adverb suffixes as well, including -wise and -ally (the latter is used on adjectives ending in -ic). In addition, the prefix a-, which may be historically derived from the preposition on ([Suz97]), can form adjectives and some adverbs from nouns. Examples are shown in Table 3.1. Table 3.1: Non-Derived and Derived Adverbs Morphological Type non-derived compound-derived adjective + -ly noun + -ly adjective + -wise -ic adjective + -ally a- + noun Examples often, well, today therefore, however, nevertheless briefly, fortunately, accusingly, swimmingly, *atomicly yearly, monthly, purposely, partly, kingly, *arrowly likewise, otherwise specifically, atomically ahead, apart Perhaps because many ADVP are composed of a single adverb, the structure of ADVP is not as well studied in linguistics as PP ([ODA93]). [CL93] consider adverbs a ‘minor’ lexical category, where the four major lexical categories are defined by the feature system in (3.22). It is assumed that subsidiary features will distinguish adverbs and adjectives ([Ale97]). [+N, -V] = noun [+N, +V] = adjective [-N, +V] = verb (3.22) [-N, -V] = preposition Nevertheless, specifiers are commonly found in both PP and ADVP. These specifiers are generally adverbs, as shown (italicized) in (3.23)-(3.25). In (3.25), the ADVP specifier is obligatory. 3 Very/Quite frequently], I go to the movies. Ò Õ Ô £6Ó (3.25) [ almost/barely at the door]. Ò Õ Ô £6Ó (3.24) [ Ò ™Ò (3.23) The mailman is [ Long ago], the earth was formed. [Ale97] further notes that in Greek -a and -os predominate, in German -weise predominates, in French -ment predominates, and in Italian -mente predominates. 82
    • Frequently, prepositions take a complement. Noun phrase (NP) complements are common, as italicized in (3.26). PP and S complements (the latter creates a subordinating conjunction) are also found, shown in (3.27)-(3.28), but in some contexts a preposition may take no complement, shown in (3.29). A few adverbs also take complements, as shown in (3.30) ([Ale97]). in the Green Mountains]. down in the cellar]. (3.28) Her parents arrived [ Ò Õ Ô £6Ó (3.30) John succeeded [ Ò ™Ò (3.29) Her parents came [ after she left]. over ]. q Ò ™Ò (3.27) The kids are [ Ò ™Ò Ò ™Ò (3.26) Mary hikes [ independently from our efforts]. While phrase structure rules represent the internal composition of an XP, tree-based grammars, such as [JVS99, Gro99], also represent an XP’s external arguments, e.g. the phrases to which an XP can adjoin, along with the structure that results. XTAG trees for an S-adjoining PP with an internal NP argument and an S-adjoining ADVP with no internal argument are shown respectively in Figure 3.1, where ( ) indicates adjunction. Ì S S S ADVP NP S Ø × ÖÖ × Ø •• • – –– P •• • – –– PP ADV As a result Consequently Figure 3.1: S-Adjoining PP and ADVP 3.2.3 Data Collection Differences between discourse adverbials and clausal adverbials cannot be attributed to their syntactic structure; as shown in Figure 3.2, both can adjoin to an S node, and the result is an S. Nevertheless, we can use their common syntax to extract the data for study. Because the set of adverbials infinite, it is not possible to study them all; the goal of this data collection was to gather a large and representative set of the different ADVP and PP adverbials that will commonly appear 83
    • in any English corpora, with the expectation that the analysis can be extrapolated to novel tokens. S S Í Í Í Î Î Î × ÖÖ × ““ “ “ ”” “ ” ” ” P • • • – –– Ú ÙÙ Ú ‘ ‘ ‘ ’ ‘ ’ ’ ’ PP S NP NP As a result people PP VP P are self-centered S NP In spring NP VP the lilacs bloom Figure 3.2: S-Adjoined Discourse and Clausal Adverbials The data studied in this thesis consists of the (correctly annotated) S-initial, S-adjoined ADVP and PP that appear at least once in the Penn Treebank I versions of the parsed WSJ and Brown corpora [PT]. These corpora were chosen because they represent a wide variety of texts, including news articles, essays, fiction, etc. In the Penn Treebank I POS-tagging and bracketing system, Sadjoined ADVP and PP are bracketed as siblings of the S they modify. In Figure 3.3, for example, then is the targeted ADVP, and in fact is the targeted PP. Figure 3.3: S-Adjoined ADVP and PP Adverbials in Penn Treebank I Although as discussed above, most adverbials can be found in a variety of positions, S-adjoined ADVP and PP were chosen for study under the assumption that this is the “default” syntactic position for most adverbial discourse connectives and that the majority of adverbial cue phrases found 84
    • in other positions will also be found S-adjoined4 . The positional variability of adverbials means however that the counts of extracted S-adjoined adverbials does not reflect the counts of these adverbials appearing in other positions. Moreover, because the underlying position of left-dislocated adverbials is not represented in Penn Treebank I parses, the counts also include some left-dislocated non S-modifiers. The data was collected using tgrep5 , which is the UNIX command grep modified for use on syntactic parses. With tgrep, the user specifies a pattern using node names and relationships between nodes. The pattern is then matched against a corpus of syntactic parses and those parses which match are extracted. The patterns used to extract both ADVP and PP are shown in the first column (top and bottom sections) of Table 3.2; they are identical except for the extracted element. Table 3.2: tgrep Results for S-Adjoined ADVP and PP in WSJ and Brown Corpora TGREP Pattern TOP (S (ADVP $. S)) TOP (S (ADVP $. (/ , / $. S))) TOP (S (ADVP $. (/ “/ $. S))) TOP (S (ADVP $. (/ , / $. (/ “/ $. S)))) Total ADVP Tokens TGREP Pattern TOP (S (PP $. S)) TOP (S (PP $. (/ , / $. S))) TOP (S (PP $. (/ “/ $. S))) TOP (S (PP $. (/ , / $. (/ “/ $. S)))) Total PP Tokens Û Û Ü Û Ü Ü Û Û Û Ü Û Û Û Û Ü Û Ü Ü Ü Û Û Û Û Û Û The relevant relationships between nodes are A WSJ Tokens 71 460 0 2 533 WSJ Tokens 372 4970 0 6 5348 Brown Tokens 1604 1388 1 2 2995 Brown Tokens 1801 3135 1 10 4947 B, meaning A immediately dominates B, and A $. B, meaning A and B are siblings and A immediately precedes B. The TOP node restricts the search to main clauses. Regular expressions are indicated by surrounding the node name in slashes (/). The caret ( ) anchors the regular expression at the beginning of a word. Regular expressions Ü were used to extract cases where punctuation intervened between the adverbial and the sister S. 4 Adverbial cue phrases in other positions may serve information-structuring purposes [FMP 01]. Positional variation will be discussed further below. 5 See [PT] for documentation of tgrep, the Penn Treebank I system of POS-tagging and bracketing, and the WSJ and Brown corpora. Ñ 85
    • The second and third columns in Table 3.2 shows the counts of adverbials retrieved by each pattern in the WSJ and Brown corpora. As shown, a quotation mark rarely intervened between an adverbial and their sibling S, but a comma frequently intervened. Total counts of S-adjoined adverbials tokens are shown in the first column of Table 3.3. Perl scripts were written to obtain total counts of each adverbial type that occurred overall, as shown in the second column of Table 3.3. Table 3.3: Total S-Adjoined Adverbials in WSJ and Brown Corpora Adverbial Category S-adjoined ADVP S-adjoined PP S-adjoined ADVP and PP Total Tokens 3528 10295 13823 Total Types 849 7424 8273 Because it was tagged by human annotators, however, the Penn Treebank I-tagged WSJ and Brown corpora contain errors. For the purposes of this thesis, the more significant types of errors were the following6 : r Incorrect tagging. For example, numerous ADVP are incorrectly tagged as PP and vice versa. r Incomplete bracketing. For example, numerous ADVP and PP are incorrectly tagged as RB (adverb) and IN (preposition), rather than as full phrasal categories. Incorrect bracketing. For example, numerous ADVP and PP are bracketed as immediately r dominating the adjacent S (instead of their correct sibling relationship). Incorrect tagging errors were corrected by hand. While in principle it would have been possible to incorporate incomplete and incorrect bracketing errors into the tgrep extraction, doing so would have introduced a large amount of extraneous material. As the goal of the data collection was to obtain a large and representative set, such errors were not incorporated. 6 Julia Hockenmaier (personal communication) has corrected many annotation errors in Penn Treebank for the purpose of building a CCG grammar ([Ste96]); however due to the properties of CCG it was not necessary for her to correct errors in the annotation of S-adjuncts. 86
    • 3.2.4 Summary In this section we have overviewed the structure and function of adverbials, and explained the use of these properties to extract a data set of ADVP and PP S-modifiers for study in subsequent sections. 3.3 Adverbial Modification Types With the exception of the lexical semantics proposed in [Kno96] that was discussed in Chapter 2, the semantics of most discourse adverbials has not been well studied. Most post-generative investigations of adverbials (c.f. [Ver97, Jac90, Gaw86, Ern84, Jac72]), treat syntactic and semantic issues that arise at the clause level, some of which will be discussed below; when discourse adverbials are mentioned at all, they are called conjunctive adverbials or discourse connectives and are specified as the domain of discourse research. Nevertheless, we can use these clause-level investigations as a guide when investigating the semantic mechanisms that cause certain adverbials to function as discourse connectives, because all adverbials can be classified along two semantic dimensions: (1) the type of modification they perform; and (2) the semantic object(s) they apply to. In this section, we investigate how prior analyses of modification type can be extended to include discourse adverbials. 3.3.1 Clause-Level Analyses of Modification Type Because the set of adverbials is so large, adverbials are often classified in the literature in terms of the type of modification they perform. [Ale97] summarizes a variety of modification types that have been proposed in the literature, as shown in Table 3.4 with a corresponding example shown in the second column7 . Though this classification has been applied only to ADVP, it can be applied to PP as well, as shown in the third column of the table. Modification types vary as to whether they are attributed to S or VP modification. For example, the first set shown in Table 3.4 is generally attributed to S-modification, while the third set is generally attributed to VP modification; the second set have been variously analyzed as both S and VP 7 Negation’s adverbial properties are variously treated (c.f. [Ale97]) and are not discussed in this thesis. 87
    • modification (see [Ale97] for discussion). Table 3.4: [Ale97]’s Modification Types Modification Type Conjunctive Evaluative Speech Act/Speaker-Oriented Subject-Oriented Modal Domain Example ADVP consequently unfortunately frankly courageously probably legally Example PP as a consequence to my disfortune to speak frankly in a courageous way in all likelihood in legal terms Time Frequency Location today frequently here on this day at most times at this place Manner Completion/Resultative Degree/Aspectual/Quantificational correctly completely very/always in a correct way in a complete fashion to a large extent/at all times Modification types also vary somewhat depending on the researcher. For example, Table 3.5 shows [Ern84]’s classification of the modification types of S-modifiers. Again, though Ernst studies only ADVP, these types can be applied to PP, as shown in the third column of the table. A comparison of the S-modification types in two tables reveals that while some differences, such as “modal” versus “epistemic”, are nominal, others are categorical; Ernst, for example, subdivides “subject-oriented” into two classes, “agent-oriented” and “mental-attitude”. [KP02] define a yet another set of modification types for both S and VP modification, shown in Table 3.6 along with ADVP and PP examples; not all ADVP have a clear example, however, as indicated by “?” in the table. They use this set to annotate all of the adverbials that appear in Penn Treebank. This annotation is part of a larger project involving the addition of semantic information to parsed corpora. As the primary focus of this project concerns the similarity in the semantic roles played by verbal arguments, not adjuncts, across a variety of syntactic structures, the set of modification types they define is understandably more general. For example, as the ADVP and PP examples indicate, “temporal” is used to label both “time” and “frequency” adverbials, and “other” represents a remainder class, e.g. for those adverbials whose semantic interpretation does not fall 88
    • into any of the other types. Table 3.5: [Ern84]’s Modification Types Modification Type Conjunctive Evaluative Pragmatic Agent-Oriented Mental Attitude Epistemic Domain Example ADVP therefore surprisingly frankly wisely willingly probably linguistically Example PP as a result to my surprise to speak frankly in a wise way in a willing way in all likelihood in linguistic terms Table 3.6: [KP02]’s Modification Types Modification Type Temporal Locative Directional Manner Purpose Discourse Cause Other Example ADVP usually here back quickly ? however therefore probably Example PP in the morning at the barn back to work in a hurried fashion in order to get ahead in addition because of this except for July 4 Building on [Gaw86], [Ver97] takes an even more general view of modification types, although her analysis incorporates, to some extent, on the more specific modification types already discussed. Following [Kas93, PS87], she distinguishes only three different types of modifiers, based on the way in which they incorporate the semantic content of the modified element. Although these distinctions cover both ADVP and PP adverbials, it is not clear whether all S-modifiers have been considered; the focus of her analysis is mainly on semantic differences between verbal adjuncts and complements. In her categorization, restrictive adjuncts specify the value of an index associated with a semantic object that was previously under-specified, such as the time or location of an event, as exemplified by the two italicized temporal and locative adverbial phrases in (3.31 a). (3.31 a) John jogged yesterday in the park. 89
    • In contrast, operator adjuncts predicate on the semantic object they modify and thereby build a more complex semantic representation of the object, as exemplified in (3.32 a) by the predication of an event by the two italicized durative and frequentive adverbial phrases . (3.32 a) John jogged twice a day for twenty years. The third class of modifiers she calls thematic adjuncts. These are stated to be a group that does not fit into either of the other two types and are described as adjuncts which add thematic information about the semantic object, such as the cause of the event, or the means by which the event occurs, as exemplified by the two italicized adverbial phrases in (3.33). (3.33 a) John opened his door with a credit card because of the robbery. The purpose of these distinctions is to explain often-observed ordering restrictions on different adjuncts. In terms of this categorization, the relative ordering of restrictive adjuncts does not usually change the interpretation of a sentence, as a comparison of (3.31 b) with (3.31 a) shows. Because restrictive adjuncts simply specify (or restrict) indices of events, the order in which these indices are specified is irrelevant. (3.31 b) John jogged in the park yesterday. Changing the ordering of operator adjuncts can change the interpretation however, as a comparison of (3.32 b) with (3.32 a) shows. The first operator adjunct asserts the duration of the jogging event, and the second operator adjunct asserts the frequency of the jogging event with its specified duration. (3.32 b) is in fact ungrammatical because it is temporally impossible to repeat an event that lasts for twenty years twice a day. (3.32 b) *John jogged for twenty years twice a day. Although not discussed in [Ver97], thematic adjuncts can also display an ordering preference; (3.33 b) seems harder to process than (3.33 a). (3.33 b) ?John opened his door because of the robbery with a credit card. In Section 3.4 we discuss other analyses of relative ordering restrictions observed in adjuncts. 90
    • 3.3.2 Problems with Categorical Approaches As the various sets of modification types indicate, it can be difficult to achieve agreement on the proper number and type of modifications that adverbials perform. Depending upon which set of modification types is selected, there are additional difficulties as well. On the one hand, at a finer grain of analysis, such as that employed in [Ale97] or [Ern84], it is often difficult to decide into which particular modification type a given adverbial should be classified. For example, though use of the adverb quickly such as is shown in (3.34) is often classified as conveying “manner” modification, at the same time it also conveys temporal information. Similarly, though use of adverbs like generally and historically] such as shown in (3.35) are often classified as conveying “frequency” modification, they can also be viewed as conveying “domain” modification, as well as an expectation about probability, akin to “epistemic” adverbs. And a great many modifiers of a given type can simultaneously be viewed as “evaluative”, particularly those that are “epistemic” (e.g. probably) and “agent-oriented” (e.g. wisely). (3.34)John ran quickly to the store. (3.35) Generally/Historically, John arrives at work on time. Moreover, many adverbials have different “readings”, each of which may be classified into a different modification type. For example, briefly can be classified as conveying a “manner” (and “temporal”) modification in (3.36 a), but in (3.36 b) it can be classified as conveying a “speech-act oriented” (and “temporal”) modification. And many adverbs that can be classified as “manner” in a sentence such as (3.37 a) can be classified as “domain” in a sentence such as (3.37 b). (3.36 a) John said he will stop by briefly. (3.36 b) Briefly, John said he would stop by. (3.37 a) John is growing emotionally. (3.37 b) Emotionally, John is growing. Cases such as clearly and obviously combine these ambiguities. In addition to having multiple “readings” which can be classified as conveying either a “manner” or “evaluative” (and “spatial”) modification, as shown respectively in (3.38 a)-(3.38 b) (examples from [Ern84]), they simultane91
    • ously convey properties that are epistemic, in that the truth of the modified clause is conveyed as apparent. (3.38 a) John burped clearly/obviously. (3.38 b) Clearly/Obviously, John burped. And even at such a fine grain of analysis, not all clausal adverbials are represented. For example, although these types were not intended to cover the wide variety of PP adverbials (e.g. “except for this”) found in our corpus (as we will see in Section 3.5), it is not clear where an ADVP adverbials such as “regardless of rights and wrongs” would fall, which is also found in our corpus (as we will see in Section 3.6). On the other hand, even if a more general set of modification types, such as that employed in [KP79] or [Ver97], is used, the same difficulty in deciding among categories for a given adverbial can arise, in addition to the problem that potentially valuable distinctions are lost, and a “container” class is necessary to gather remainders that don’t fit in other classes. 3.3.3 Modification Types as Semantic Features On the surface, it might appear that the modification types could at least be used to distinguish discourse adverbials. After all, in Tables 3.4 and 3.5, discourse adverbials are classified as “conjunctive”, and in Table 3.6 they are classified as “discourse”. But these modification types have not been able to distinguish clausal and discourse adverbials. For example, while clause-level researchers treat surprisingly, unfortunately, clearly as clausal adverbials, [Kno96] treats them as discourse connectives, as discussed in Chapter 2. And in fact, all of the “conjunctive” or “discourse” adverbials can be classified into some other modification type, for in contrast to the other modification types, whose purpose is to isolate a particular property conveyed by a set of adverbials, the “conjunctive” or “discourse” type isolates a particular syntactic structure underlying a set of adverbials. For example, discourse adverbials such as then, first, finally, already can be (and are in [KP79]) classified as “temporal”, because they convey temporal information about the relation of the element they modify to the surrounding discourse (or spatio-temporal context). Discourse adverbials such as as a result, consequently can be classified as “evaluative” if the causal connection 92
    • between the element they modify and the surrounding discourse (or spatio-temporal context) is not common knowledge8 . Above we discussed problems of multiple typing and unclear classification as they apply to clausal adverbials. Here we see that these problems extend to discourse adverbials as well. One solution to these problems lies in adopting a non-categorical approach to modification type. If modification types were viewed in terms of semantic features, for example, then adverbials could be represented as supplying multiple (compatible) features. Moreover, these features could be used to represent both clausal and discourse adverbials. In such terms, probably might be used, as exemplified in (3.39), to supply a degree of likelihood feature9 , while clearly and obviously might be used, as exemplified in (3.40), to supply evaluative, epistemic, and spatio-temporal features. (3.39) Probably John woke up at 5 a.m. (3.40) Obviously John woke up at 5 a.m. Similarly, as exemplified in (3.41)-(3.42), both then and on March 14, 1946 might be used to supply temporal features; only then supplies these features in terms of a relation with the prior discourse. (3.41) On March 14, 1946, my father was born. (3.42) Then, my father was born. While such an approach requires further study, such features could be supplemented with [Kno96]’s features for cue phrases, discussed in Chapter 2. As stated above, this approach would overcome the problems of multiple typing and unclear classification found in categorical approaches to modification type by allowing adverbials to supply features from a variety of modification “types”. What modification features cannot distinguish, however, is the fact that while the features supplied by clausal adverbials generally apply to entities and/or properties within the modified element, the features supplied by discourse adverbials apply to the modified element itself and the surrounding discourse or context. Accounting for this difference requires discussion of the semantic objects 8 [Kno96]’s “source of coherence” feature distinguishes this use as “semantic”, as opposed to “pragmatic”, as discussed in Chapter 2. 9 [Ern84] makes a reference to such a feature, but his analysis applies to categorical modification types 93
    • that these adverbials supply features to. Consideration of these objects can also clarify problems with multiple readings in both clausal and discourse adverbials, thereby decreasing the amount of variation that modification features are required to cover. And it works the other way too; understanding the modification features of adverbials can clarify the possible semantic objects they can apply to; together these classifications can be used to build a formal semantic representation of all adverbials. It is this discussion which is the subject of the next few sections. 3.3.4 Summary In this section, we have shown that the modification type alone is not sufficient to distinguish clausal and discourse adverbials. We have suggested that some problems of multiple typing and unclear classification can however be overcome if modification type is viewed in terms of semantic features. 3.4 Adverbial Semantic Arguments A number of issues concerning the semantic interpretation of the external syntactic argument of adverbials have been addressed at the clause level. In this section we present these issues and show how they extend to the discourse level. We show how the discourse deixis research introduced in Chapter 2 exposes the semantic objects that adverbials apply to and distinguishes clausal and discourse adverbials in terms of the the number and type of these semantic objects, thereby laying the foundation for understanding the semantic mechanisms causing the “discourse connectivity” of discourse adverbials. 3.4.1 (Optional) Arguments or Adjuncts? As discussed in Section 3.2, ADVP and PP can function both as adjuncts and VP arguments; ambiguity between these functions has mainly been addressed in relation to PP, but the analyses are applicable also to ADVP. A standard syntactic test for VP argument structure is the “do so” test ([Ver97, KP79]). As shown in (3.43 a-b), where brackets indicate the boundaries of the verb and its internal arguments 94
    • and italics indicate the referent of “do so”, “do so” must replace the entire VP, e.g. the verb and all its arguments; the PP argument cannot be left out. (3.43 a) *Mike [gave a recommendation to Phyllis] and Mary did so to Liz. (3.43 b) Mike [gave a recommendation to Phyllis] and Mary did so too. As shown in (3.43 c-d), if the PP is an adjunct, however, it need not be replaced by the “do so”. [SBDP00, Har99] explain the difference as follows: if the PP supplies an argument, there is no way to determine what semantic function it serves when it modifies “do so”, because “do so” can refer to the interpretation of modified and unmodified verb phrases10 . (3.43 c) Mike [read the textbook] in the bedroom and Mary did so in the classroom. (3.43 d) Mike [read the textbook] in the bedroom and Mary did so too. [PS87] also discuss the “iterability” test, which distinguishes VP adjuncts from arguments because only adjuncts can be “iterated”, as shown in (3.44). (3.44 a) John met Susan in Chicago in the Hyatt hotel in the lobby. (3.44 b) *John gave a book to Debbie to Paul. As [Ver97] points out, however, failing the iterability test, as in (3.44 c), does not always imply argument status; adjuncts are in general not iterable if their semantic contribution is contradictory. (3.44 c) *John met Susan in Chicago in Boston. [PS87] also note that in English, arguments tend to precede adjuncts, as shown in (3.45 a). In (3.45 b), changing the order of the PPs changes the interpretation; the book, rather than the “giving” is read as being located in the library. (3.45 a) John gave a book to Debbie in the library. (3.45 b) John gave a book in the library to Debbie. Furthermore, [PS87] note, many adjuncts cause extraction islands, as shown in (3.46 a), while unbounded dependency into arguments is generally possible, as shown in (3.46 b). As [SBDP00] 10 Such tests aren’t perfect, however; as Bonnie Webber notes (personal communication), (3.43 a) sounds fine if we replace give to with provide for: Mike provided a recommendation for Phyllis and Mary did so for Liz. 95
    • explain, in LTAG, extraction is modeled a relation among elementary trees in a tree family that have essentially the same meaning and differ only in syntax. In a tree where an element is extracted, its original position can be located in a different tree in the same family, and its semantics can be computed. Failed extraction indicate that the modifier from which the element is extracted is not present in any elementary tree in the tree family. (3.46 a) *Where did John give a book to Debbie in ? (3.46 a) Whom did John give a book to in the library? Semantic tests for distinguishing verbal arguments and adjuncts have also been proposed. As [PS87] note, the semantic contribution of an argument is dependent on the meaning of the head. For example, in “John told a story to Mary”, the “telling” must have both a “thing told” and a recipient; “to Mary” fills the latter role. More formally, the semantic contribution of an argument is entailed rvu ¬ by the sentence containing the verb11 . For example, the PPs in (3.47 a) are entailed ( ) by the sentence in (3.47 b), though not instantiated, but the PPs in (3.48 a) are not entailed by the sentence in (3.48 b) (examples from [Ver97]). These entailment patterns indicate whether or not semantic information supplied by a PP (or ADVP) is directly relevant to the meaning of the verb ([Ver97]). (3.47 a) John complained to Mary about the heat. š John complained to about › ¬ ›‰ š Á1$qh v ¬ (3.47 b) John complained. (3.48 a) John sang to Mary about his homeland. ¬ › Áqh rvu ¬ š John sang to , John sang about › ¬ š Łh rvu ¬ (3.48 b) John sang. Another semantic test to distinguish verbal arguments and adjuncts is the “presupposition” test12 . As defined in [Sae96], the presupposition test is applied by comparing the interpretation of a sentence with a modifier, as shown in (3.49 a), to a corresponding sentence without the modifier, where the information supplied by the modifier is available in the prior context, as shown in (3.49 b) (example from [SBDP00]). If the two sentences can have the same interpretation, then the modifier expresses a presupposed semantic argument. 11 The entailment relation in linguistics is one in which the truth of one sentence necessarily implies the truth of another ([ODA93]). 12 Generally, a presupposition is an assumption or belief implied by the use of a particular phrase ([ODA93]), but a complete account of presupposition is still an open question (c.f. Chapter 5). 96
    • (3.49 a) Find the power cable. Disconnect it from the power adapter. (3.49 b) The power cable is attached to the power adapter. Disconnect it. While none of these tests are foolproof ([SBDP00]), and the argument structure of many verbs is still undecided, clear cases serve to show that verbal semantic arguments are sometimes syntactically optional, although their semantic contribution is still interpreted using either context or world knowledge. Such syntactically optional arguments, also called “hidden” or “implicit”, have been widely discussed in linguistics, psycholinguistics, and computational linguistics (c.f. [MTC95]). Implicit arguments are generally classified into two types: definite and indefinite. Definite hidden arguments are anaphoric to some salient entity in the discourse or spatio-temporal context; their interpretation is context-dependent. For example, in (3.50 a)-(3.51 a) optional arguments are instantiated as PPs whose anaphoric complement resolves to the bold-faced element in the prior clause. (3.50 a) The due date for the grant has passed. Mary didn’t apply for it. (3.51 a) Bill nearly forgot about going to the bank. John reminded him about it. As shown in (3.50 b) and (3.51 b), these arguments can be implicit, but only if the information necessary to resolve them is supplied by the prior context. (3.50 c) and (3.51 c) are infelicituous, because this information is not retrievable. (3.50 b) The due date for the grant has passed. Mary didn’t apply . q (3.51 b) Bill nearly forgot about going the bank. John reminded him . q (3.50 c) *Mary didn’t apply . q (3.51 c) *John reminded Bill . q Indefinite hidden arguments are not anaphoric with anything; their interpretation is independent of context. [Mit82] discusses the VP eat in detail, in constructions such as (3.52), which is grammatical with or without the explicit argument even without prior context. As shown in (3.53), relational NPs, such as mother, winner, etc., can also contain hidden indefinite NP arguments. (3.52) Mary ate (something). (3.53) Mary talked to a mother (of someone) today. 97
    • Example (3.54) (from [MTC95]) shows that the distinction is not always strict. When implicit, the syntactically optional argument of the verb donate can be anaphoric to information in the prior context, as in (3.54 b), or can be interpreted as unspecified and thus independent of any information in the prior context, as in (3.54 c). (3.54 a) The United Way asked John for a contribution. John donated five dollars to them. (3.54 b) The United Way asked John for a contribution. John donated five dollars . q (3.54 c) Hardly anyone knows that John donates thousands of dollars each year . q 3.4.2 External Argument Attachment Ambiguity Even when an ADVP or PP is identified as an adjunct, additional ambiguities may exist. In particular, the positional variability of ADVP and PP discussed in Section 3.2 produces ambiguity as to the phrasal unit being attached structurally and modified semantically. A common example of this ambiguity is shown in (3.55), where the PP with a telescope is ambiguous between adjectival (NP modifying) or adverbial (VP or S modifying) attachment, yielding ambiguity as to who is in possession of the telescope. There are a number of approaches to structural disambiguation of adjectival versus adverbial PP attachment, including probabilistic ones (c.f. [Bik00, CW00, McL01]). (3.55) John saw the man with the telescope. A central concern for many clause-level ADVP researchers is the ambiguity between VP and S modification; most of these researchers seek to associate VP and S modification with particular modification types. (Schreiber 1971), for example takes a “deep syntactic” approach, arguing that evaluative and modal adverbs are underlyingly adjectives that take a sentential subject nominal. As evidence of this he cites the synonymy of the two sentences in (3.56 a). He argues that this synonymy distinguishes them from VP modifiers, which, as shown in (3.56 b), cannot be paraphrased with sentential subject nominals even when they are S-initial. Obviously, John was at fault. (3.56 b) *That John spoke to his friend was warm. 98 u u (3.56 a) That John was at fault was obvious. Warmly, John spoke to his friend.
    • In [Swa88]’s terms13 , the essence of [Sch71]’s argument is that a sentence containing an Smodifying adverb expresses at least two sets of propositional content: the unmodified and the modified proposition. Swan take a different view of the modified proposition, however, and includes subject- and speaker-oriented adverbs. In Swan’s view, sentences modified by speaker-oriented adverbs are represented as in (3.57). This representation indicates that, in addition to conveying the unmodified proposition (S), speaker-oriented adverbs also convey a reduced proposition concerning the speaker (Speaker says (S)), which is modified by the ADJ derivative of the adverb. (3.57) (S), (ADJ(Speaker says(S))) e.g. frankly, honestly Sentences modified by evaluative, modal and subject-oriented adverbs are represented as in (3.58). This representation indicates that, in addition to conveying the unmodified proposition itself (S), these adverbs also convey a reduced proposition concerning the speaker in which the speaker is evaluating as ADJ the information in the modified proposition. (3.58) (S), (Speaker says (ADJ(S))) e.g. fortunately, probably, courageously [Gre69, AC74] take a “surface syntactic” approach to distinguishing VP and S modification, arguing that S-modifying adverbs cannot be the focus of negation or interrogation. These tests are exemplified in Table 3.7 with the ADVP italicized; as shown these tests also apply to PP adverbials. Table 3.7: [Gre69]’s Syntactic Tests for Distinguishing VP and S Modification Syntactic Test Adverbial is the focus of negation ADVP Examples He didn’t walk slowly. *He didn’t walk probably. PP Examples He didn’t walk at a slow rate. *He didn’t walk in all likelihood. Adverbial is the focus of interrogation Did he walk slowly? *Did he walk probably? Did he walk at a slow rate? *Did he walk in all likelihood? However, none of these approaches address the multiple readings of adverbs such as clearly, obviously, strangely discussed above; they pass the tests in Table 3.7, as examplified in (3.59 a)-(3.59 b), because they display multiple “readings”; as VP-modifiers, they are some variety of “manner” 13 Swan’s main goal is to provide a corpus-based analysis of English -ly S modifying adverbs as having developed from VP modifying intensifier and manner adverbs, through Old and Middle English, by means of syntactic/pragmatic shifts. 99
    • adverbs, while as S-modifiers they are often classified as “evaluative”. (3.59 a) John didn’t burp clearly/obviously/strangely. (3.59 b) Did John burp clearly/obviously/strangely? [Jac72] and [MG82] argue that such cases should be treated as homonyms, and given multiple lexical entries. [Ern84] simplifies this analysis by giving adverbs, whenever possible, a single semantic interpretation, while leaving their syntactic argument underspecified in the lexical entry when multiple interpretations are possible. For example, because modal adverbs such as probably do not change their meaning no matter their position in the clause, as a class their syntactic argument can be lexically specified in a single lexical entry as S. But when multiple readings are possible, as for obviously and strangely, Ernst treats the syntactic argument as lexically underspecified, drawing its syntactic argument from the closest dominating node14 . Obviously, for example, always conveys that the argument is obvious, but the argument can be a VP, yielding a manner interpretation, or an S, yielding an evaluative interpretation. Ernst also applies this argument underspecification analysis to adverbs that have been previously classified with a single modification type but that also display subtly different “readings”. Examples commonly classified as “agent-oriented” are shown in (3.60). (3.60 a) John approached the Duchess tactlessly. (3.60 b) Tactlessly, John approached the Duchess. In Ernst’s analysis, tactlessly in (3.60 a) describes a judgment about the “approaching”, and the adverb takes a VP argument, while tactlessly in (3.60 b) describes a judgment about the situation involving John’s action of “approaching”, and the adverb takes an S argument. In both cases, “John” is interpreted as “tactless”. However, Ernst argues that cases such as (3.61)-(3.62) show that this is only an inference. For example, we would normally draw from (3.60 a) the inference that John is tactless. But we can use the S-modifier with the opposing meaning (tactfully) in (3.61), along with the additional context provided by the VP adverbial (knowing...), to block the inferred judgment that John is tactless, while still asserting that John’s approach is tactless. 14 In the normal case, that is. Many readers will be able to get multiple readings in any position; arguably the closest dominating node corresponds to the first, or easiest, reading. 100
    • (3.61) Tactfully, John approached the Duchess tactlessly, knowing that apparent disregard for authority was highly prized in this culture. (3.62) Stupidly, Alice had anwered the questions wisely and blown her cover as an inmate of the insane asylum. While S-readings cannot be faked, as exemplified in (3.63), where both adverbs are S-modifiers because the closest dominating node of both is S, Ernst argues that this lack of a felicituous “fakedreading” is accounted for by the fact that the S-modifiers assert opposing judgments about the same semantic object; the only way to block one of the judgments about Alice is to explicitly claim that the surrounding circumstances were abnormal, as in (3.64). (3.63) *Stupidly, Alice wisely had anwered the questions. (3.64) Although it normally would have been stupid for Alice to blow her cover as a mime, Alice wisely had answered the questions because she had already noticed that her backup had arrived. Ernst’s use of variations in the external semantic object inutitively feels correct. Investigating the use of adverbs (or PPs) with gradient meanings (instead of just opposing meanings) would indicate if more than two argument attachment sites for interpreting semantic objects were required; adjectives, from which many adverbs are derived, are generally non-gradient, however [WN98]. Nevertheless, Ernst’s analysis is largely unformalized, with respect to both the syntactic and semantic representation of adverbs and their arguments; moreover, it does not determine how to decide if a single modification type (such as “temporal”) should be viewed as S or VP. [Ale97] bases this decision on observed ordering restrictions on adverbs, while also providing an analysis of the syntactic representation of adverb attachment. As discussed in Section 3.3, [Ver97] explains certain ordering restrictions in terms of modification type; [Ale97], in contrast, explains a wider variety of these restrictions in terms of both modification type and underlying adjunction site. [Ale97] notes that strict sequencing and scope hierarchies attested across languages are correlated with modification type. For example, in English, as exemplified using modal adverbs in (3.65a) (3.65b), S-modifying adverbs must appear higher in the syntactic tree than manner adverbs, and as shown in (3.66a) - (3.66b), evaluative adverbs must appear higher than agent-oriented adverbs. 101
    • (3.65 a) Probably John cleaned the room carefully. (3.65 b) *Carefully John probably cleaned the room. (3.66 a) Fortunately, John cleverly climbed to the top carefully. (3.66 b) *Cleverly, John fortunately climbed to the top carefully. Working within the Minimalist program, Alexiadou provides a cross-linguistic syntactic account for such restrictions on ADVP according to their modification type. In his analysis, “manner” adverbs are base-generated below the verb, and their appearance in other positions involves movement to a specifier position of a functional projection as licensed by Minimalist principles. Other adverbs are base-generated as specifiers of functional projections to the left of the verb. He distinguishes a variety of functional projections, distributes along them (as their underlying place of attachment) ADVP that exemplify the various modification types, and in each case demonstrates the range of felicituous surface positions that result from movement of the modified phrasal categories. Of course, given how large the set of ADVP adverbials is, and the inability of modification types to fully categorize all adverbs, it’s not clear if [Ale97] accounts for all possible sequences and positions for all ADVP adverbials. In particular, the apparent freedom of comma- (or pause-) delimited adverbials such as shown in (3.65 c) - (3.66 c) is largely unaccounted for using Minimalist principles, although to be sure the felicitousness of these cases may vary to a great extent on a speaker-to-speaker basis, and the cases themselves may be much more common in speech than in text, thus making them analyzable as self-corrections. (3.65 c) Carefully John, probably, cleaned the room. (3.66 c) Cleverly, John, fortunately, climbed to the top carefully. [Bie01] cites [McC88]’s treatment of comma-delimited modifiers as S-adjuncts, but notes that this analysis allows infelicituous sentences such as shown in (3.67), where other than Mary should be interpreted as modifying everyone, despite its comma-delimited S-medial position. (3.67) *What food, other than Mary, repels everyone? In such an analysis, the relationship between everyone and the other than Mary phrase would have to be resolved anaphorically. [Bie01] proposes instead a fully structural analysis in CCG 102
    • [Ste00c] which uses type-raising to cover the felicituous positions of alternative phrases. We will discuss his semantic analysis of these phrases in Section 3.5. 3.4.3 Semantic Representation of External Argument As discussed in Chapter 2, much of the linguistics literature addresses only the proposition interpretation of an S; the semantic representation of S is usually a truth value: true or false. The above discussion indicates that S makes available other interpretations as well. In [Ver97, KP79], for example, and across the adverbial literature in general, reference is made to the modification of events and situations. [Ern84] distinguishes an even wider variety of S interpretations, including actions (or events), situations, states of affairs, and mental states. However, in most cases this classification is described rather vaguely; while these interpretations may be used to distinguish various properties of adverbials, the properties distinguishing these interpretations are not well-defined. For example, Table 3.8 provides the semantic interpretations of some of Ernst’s modification types15 . Table 3.8: Semantic Interpretations of [Ern84]’s Modification Types Semantic Interpretation is ADJ. agent judged ADJ due to relevant in ADJ domain is ADJ. is ADJ. ADJ manifests agent ¡ ¡ ¡ ¡ ¡ ¡ Modification Type Manner Agent-Oriented Domain Epistemic Evaluative Mental Attitude Semantic Object ( ) action situation situation situation state of affairs mental state ¡ Example ADVP slowly wisely legally possibly fortunately angrily Derived ADJ slow wise legal possible fortunate angry The first column of the table contains an example adverb, and the second column contains the modification type into which it is classified. The third column contains the semantic interpretation of the modification, and defines the involvement of the semantic interpretation, , that is equated ¡ ¡ with the external syntactic argument. The specific object equated with for each interpretation is shown in the fourth column. Note that Ernst makes use of the adjective (ADJ) derivative of the ADVP; the fifth column contains the ADJ derivative for each adverb. 15 Certain of Ernst’s modification types are subdivided with respect to these interpretations. 103
    • [Moo93], in contrast, provides a formal representation of the possible semantic interpretations of ADVP adverbials’ external arguments. His formalization is based on the same data addressed in [Ern84], in particular the two different interpretations exemplified in (3.68)-(3.69) that are produced by a single adverb located in different positions. Moore notes that in (3.68), strangely intuitively modifies the event of John singing. In (3.69), however, the singing event may be quite ordinary; it is the fact that John sang that is strange. (3.68) John sang strangely. (3.69) Strangely, John sang. Following [Dav67], Moore argues that event sentences16 assert the existence of an event in the domain of entities. It is this event, these researchers argue, that strangely modifies in (3.68). This š is represented in predicate logic as shown in (3.68’), where represents a hidden argument to the verb17 that ranges over events, and strangely is represented as additional predication of . In words, š the formula asserts that there exists a singing-by-John event, and that event is strange. (3.68’) x(Sang(j,x) & Strange(x)) h Moore further argues that true propositions assert the existence of a situation (or fact) in the domain of entities. In Moore’s view, it is this situation that strangely modifies in (3.69). This is represented in predicate logic as shown in (3.69’), where “Fact” denotes a relation between a › situation and a proposition, is represented as a hidden argument to this relation that ranges over situations, and strangely is represented as additional predication of . In words, the formula asserts › there exists a situation (fact) of there being a singing-by-John event, and that situation is strange. (3.69’) y(Fact(y, x(Sang(j,x))) & Strange(y)) h h 3.4.4 Semantic Arguments as Abstract Objects In essence, both Moore and Ernst associate adverbial VP attachment with event (or action) modification. While Moore associates adverbial S attachment with situation (or fact) modification, a 16 17 sentences that describe events, in contrast to sentences containing the be verb, e.g. Mary is kind. See [Moo93, Dav67, BP83] for reasons why this argument is associated with the verb and not the sentence. 104
    • situation is only one of the possible semantic interpretations that Ernst associates with adverbial S attachment; he also mentions the modification of states of affairs and mental states. In fact, the discourse deictic research discussed in Chapter 2 provides evidence that the possible semantic interpretations associated with adverbial attachment are more diverse than Moore and even Ernst acknowledge. In Chapter 2 we referred to the possible interpretations that an S makes available as abstract objects (AOs). As illustration, recall the variety of AO types proposed by [Ash93], shown again in Figure 3.4. We demonstrated how discourse deixis could be used to refer to these objects. Figure 3.4: [Ash93]’s Classification of Abstract Objects Also in Chapter 2, [Ven67], [Web91], [DH95] and we ourselves observed that a variety of additional AOs, including descriptions, beliefs, speech acts, textual objects, and even defeasible rules and discourse relations can be the objects of discourse deixis reference. More generally, as stated in Chapter 2, there are at least as many AOs as there are abstract nouns. In many cases they can be referred to via discourse deixis simply by inserting them into the sentence “That is a(n) ...”. Whether or not the classification of AOs shown in Figure 3.4 is complete, we argue that AOs provide an appropriate way of understanding adverbial semantics. We argue that the range of semantic objects evoked when an adverbial modifies an S coincides with the range of semantic objects to which a discourse deictic refers. As a preliminary illustration, consider Table 3.9. The third and fourth column contain PP and ADVP clausal and discourse adverbials. The second column contains comparable discourse deictic reference. For illustrative purposes, consider the sentence People made mistakes as the one being 105
    • referred to and modified. The first column specifies the resulting AO interpretation. Table 3.9: Abstract Object Interpretations Abstract Object event/situation fact proposition description belief speech act textual object Discourse Deictic that happened afterwards that’s a fact that’s true that’s a good description that’s my belief that’s to be frank repeat that PP Adverbial after that, in fact, in truth, as a description, in my view, in plain English as a repetition ADVP Adverbial afterwards, really, truly, descriptively, personally, frankly, again, As the table shows, we can interpret the sentence as a variety of AOs. If referred to by a discourse deictic, it is the predication on the discourse deictic that determines the AO interpretation. For example, “happening afterwards”, is a property of events, so the referent of the discourse deictic is interpreted as an event. If the sentence itself is modified, it is the adverbial that performs the predication, thereby determining the AO interpretation. For example,“afterwards”, is also a property of events, so the sentence is again interpreted as an event. More generally, notice that we can in many cases create a PP adverbial for an AO simply by inserting it into the PP “As a(n) ...”. In Sections 3.5-3.6 we will see a wide variety of AOs instantiated in this and other S-modifying adverbials. Though our data consists of S-modifying adverbials, we have not excluded events or distinguished them from situations in the above table, because for many S-modifying adverbials, including temporal, frequency and spatial adverbials, event and situation modification cannot easily be distinguished. And if these adverbials modify events, and events are made available by VP, then either a movement analysis or a percolation analysis is required to explain how these adverbials can modify events while adjoining to S. Moreover, we will see in our data that the possible interpretations of VP are as important for distinguishing discourse and clausal adverbials as the possible interpretations of S, just as they are for distinguishing discourse deictic reference from NP reference in demonstrative use. In Chapter 6 we will return to the issue of semantic representation of AOs and adverbials, taking into consideration both the analyses proposed in discourse deixis research and in adverbial research. Until then, we retain the distinction between the semantic origin of events and 106
    • situations via the phrase “AOs made available by S”. For the purposes of brevity in subsequent sections, we define a working terminological distinction between concrete objects and abstract objects. These objects are distinguished is as follows: concrete objects are entity interpretations made available by (e.g. denoted by or inferable from ([Pri81])) NPs. Concrete objects are usually, but not always, perceivable by the senses [Ash93], and include people, organizations, physical objects, etc. Abstract objects are entity interpretations made available by both NPs and non-NP constituents, including verb phrases, clauses, etc. As discussed in Chapter 2, abstract objects are usually, but not always, imperceptable to the senses, and include reasons, beliefs, trials, defenses, theories, rights, etc. In practical application, the distinction is more of the “I will know it when I see it variety”, and it may not be categorical; demonstrating it for every entity would require an infinite corpus, which is impossible, or constructed examples, which are often suspect. Nevertheless, the theoretical distinction betwen these two types of entites clarifies the difference between the anaphoricity of discourse and clausal adverbials. However, it does not account for AOs retrieved from the spatio-temporal context. Our corpus consists only of text, and though there is some discussion of this in the subsequent sections, in our view, the role of the spatio-temporal context is still an open question. 3.4.5 Number of Abstract Objects Clausal adverbials (e.g. in my view) and discourse adverbials (e.g. afterwards) are both contained in Table 3.9 because both take as their external semantic argument an abstract object interpretation made available by S. Thus, at both the syntactic and semantic level, clausal and discourse adverbials are not distinguishable in terms of their external argument. It is in terms of the number and interpretation of their arguments that significant semantic differences between clause and discourse adverbials appear. We argue that the number of arguments an adverbial contains and the interpretation of these arguments determine whether or not it functions semantically as a discourse connective and is thereby classified as a clause or discourse adverbial. We claim that discourse adverbials contain at least one argument that depends for its interpretation on some salient AO contained in or derivable from the discourse context, which thereby renders 107
    • them uninterpretable with respect to their matrix clause alone, even after resolution of any concrete object arguments. In contrast, clausal adverbials contain no such argument; their interpretation is context-independent after resolution of any arguments to contextual concrete objects18 . Discourse adverbials are thus very similar to discourse deixis in that both require an AO interpretation from the prior discourse or spatio-temporal context for their interpretation. [WJSK03] provide strong behavioral evidence for this view. For example, just as in (2.50), repeated from Chapter 2, the discourse deictic takes as its referent the discourse relation between clauses, so does the PP adverbial, as one example, in (3.70) derive its prior argument from the result relation (imparted by so) between the two clauses. (2.50) If a white person drives this car it’s a “classic”. If I, a Mexican-American, drive it, it’s a “low-rider”. That hurts my pride. [DH95] (3.70) John just broke his arm. So, as one example, he can’t cycle to work now. More generally, notice that just as a discourse deictic can be replaced by its explicit demonstrative+AO counterpart, e.g. that text, that speech act, or that contrast in (2.50), in the same way, we can construct an adverbial from a discourse deictic or its explicit demontrastive+AO counterpart, thereby creating a discourse adverbial which explicitly relates two abstract objects, e.g. for that reason, in this case, after that, or as an example of the consequences of that in (3.70). Of course, as (3.70) indicates, explicit demonstrative reference is not the only mechanism by which discourse adverbials are created. In the next two sections we discuss these mechanisms in detail. Before beginning our corpus analysis, however, note that the appropriate semantic representation of almost all the mechanisms discussed in Sections 3.5-3.6 is still an active line of research. As such, the various research we will discuss has employed a variety of formalizations (at various levels of complexity). As our goal in these sections is to discern the semantic mechanisms underlying adverbial function, we will not be advocating a particular formalization, but will present a variety as required to represent the particular properties we are focusing on in each of the semantic mechanisms we discuss. 18 As will be discussed in Chapter 5, however, there are other ways apart from their semantics that both types of adverbial can evoke contextual AO interpretations. 108
    • We also emphasize that we have distinguished clausal and discourse adverbials semantically based only on the number and interpretation of the arguments they contain. It may or may not prove useful in practical application to further distinguish clausal and discourse adverbials according to the resolution of their arguments. For, as noted above, and we will see in the next two sections, AOs are made available by both NPs and non-NP constituents (as well as by language-external context). Thus, if an AO is nominalized in the discourse, the definite semantic argument of a discourse adverbial may resolve to that nominalization. A stronger distinction between discourse and clausal adverbials would assert that an adverbial functions semantically as a discourse connective if and only if the interpretation of one or more of the semantic arguments it contains is dependent on an AO that is retrieved, or reifed([Web91]) from a non-NP constituent. Such a distinction might be made under the assumption, for example, that NP-reference is distinguished from reference to non-NP constitutents in anaphora resolution algorithms. We will discuss this issue in more detail in Chapter 6. 3.4.6 Summary In this section, we have presented a number of analyses proposed at the clause level to account for the semantic interpretation of adverbial modification. We have argued, and will show in the next section, that such clause-level semantic tools can be extended to account for commonalities between clausal and discourse adverbials, while also distinguishing semantic differences between them. In particular, we have argued that discourse deictic research provides a better basis for the classification of the semantic arguments of adverbials than clause-level research, and have presented our formal definitions of the semantic distinction between clausal adverbials, which do not function semantically as discourse connectives, and discourse adverbials, which do function semantically as discourse connectives. We have defined this distinction according the number and interpretation of semantic arguments these adverbials apply to: discourse adverbials: contain at least one argument that depends for its interpretation on some salient AO contained in or derivable from the discourse context; their interpretation is contextdependent. 109
    • clausal adverbials: contain no such argument; their interpretation is context-independent after resolution of any semantic arguments to contextual concrete objects. We view these definitions, and our discussion in the next two sections of the semantic mechanisms that motivate them, as a starting point, providing a foundation that is general enough to be supplemented by a variety of semantic formalisms. In Chapter 4 we will select a working formalism to show how the semantics developed in this chapter can be incorporated into the DLTAG model. In Chapter 5, we will see other ways an adverbial can contribute to discourse coherence. 3.5 S-Modifying PP Adverbials As discussed in Section 3.2, the syntactic structure of the S-modifying PP adverbials in our corpus can be respresented with the tree in Figure 3.5, where P represents the preposition head, S repre- 2 T% sents the external argument, Arg represents the internal argument, and SPEC represents optional (specifiers) modifiers of the head19 . S Í Í Î Í Î Î PP P Arg 2 BT% Í Í Ý Î Î ÍÝ Î Spec S Figure 3.5: Syntactic Structure of S-Modifying PP Adverbials Both syntactically and semantically, therefore, all the prepositional phrases in our data set are binary predicates. As illustration, they can all be represented semantically using lambda calculus (c.f. [HK98]) as in (3.71), where “[[ ]]” represents an interpretation function, and preposition › š › the interpretation of the internal argument, and and , where we resolve š represents the relation supplied by the preposition to its arguments, to to the interpretation of the external S argument. For example, if m represents the interpretation of the NP Mary, then the interpretation of the PP for 19 Although as discussed in Seciton 3.2, internal arguments are not always found in PPs, only a few PPs in our data did not take an internal argument. These were either misparsed subordinating conjunctions or topicalized verb constituents (e.g. “on they came out” “they came on out”. Þ 110
    • Mary can represented as in (3.72), and if is(w,l) represents the interpretation of the S work is life, then the final interpretation of the S-adjoined clausal adverbial can be represented as in (3.73). (3.71) [[PP]] = x y.preposition(x, y) ß ß (3.72) [[for Mary]] = y.for(m, y) ß (3.73) [[For Mary, work is life]] = for(m, is(w,l)) Thus, the predicate-argument structure of the PP adverbial does not distinguish clausal and discourse adverbials. However, if we exchange Mary for that reason, we produce the discourse adverbial for that reason. In order to distinguish clausal and discourse PP adverbials, therefore, we must investigate the semantic mechanisms underlying the predicate argument structure and interpretation of the internal arguments of the PP adverbials in our corpus. 3.5.1 Proper Nouns, Possessives, and Pronouns Approximate counts in our corpus of types and tokens of the internal arguments discussed in this section are shown in Table 3.10. Table 3.10: Approximate Counts of Tokens and Types of some Internal PP arguments # Tokens 776 307 192 # Types 428 273 68 Internal Argument proper nouns and years single nouns modified by possessives pronouns As shown, a number of PP internal arguments in our data set are proper nouns or years. Table 3.11 provides some examples along with their corpus counts; the majority occurred only once. Table 3.11: PP Adverbials with Proper Noun or Year Internal Argument # 2 2 2 24 PP Adverbial after 1832 by God for Blanche in August # 8 30 1 1 111 PP Adverbial in Tokyo on Friday to Africa until 1971
    • Although proper nouns and years may be interpreted with respect to the discourse or spatiotemporal context, they do not refer to abstract object interpretations; rather, they denote people, places, animals, etc., which we have called concrete objects. Their semantic interpretation can be represented as exemplified in (3.74), where bold-face represents the denoted concrete object. (3.74) [[God]] = God Some of the internal arguments in our corpus are single nouns modified by a possessive proper noun; Table 3.12 provides some examples along with their counts. Table 3.12: PP Adverbial with Possessive Proper Noun Internal Argument # 1 2 1 1 1 PP Adverbial despite Berger’s report for God’s sake in Blanche’s defense in Krutch’s view in Plato’s judgment # 1 1 1 1 1 PP Adverbial to Ann’s consternation to Welch ’s chagrin under Yakov Segal’s direction with Herberet ’s blessing within Erikson’s schema As shown, we’ve provided cases in which the head noun denotes an abstract object. For example, views, judgements, consternation, blessing are all abstract objects. However, the modification by possessive proper nouns does not make these abstract objects context-dependent; the semantic representation of these NPs is akin to the representation in (3.74). Of course, these NPs may be coindexed with abstract objects in the prior discourse, as shown in (3.75), where Herberet’s blessing can be interpreted as the action of Herberet raising his hand and praying. (3.75) Herberet raised his hands and began to pray. Mike and Mary knelt before him. They looked into each other’s eyes and smiled. With Herberet’s blessing, they would be married. A number of the PP internal arguments in our data set are pronouns or are adverbs that are functioning as pronouns (e.g. now, then, here). Table 3.13 provides some examples along with their corpus counts. As shown, many of these pronouns are animate pronouns, by which we mean the first and second person pronouns, and third person pronouns that refer to animate entities. Although animate pronouns do not refer to abstract object interpretations, they are anaphoric or deictic, and 112
    • must be interpreted with respect to NPs present in (or inferable from [Pri81]) the discourse or spatiotemporal context. In [HK98], the semantic interpretation of a pronoun is represented as denoting an entity via an index that is mapped to relative to an assignment function , where is determined j j F C F by a context . An example is shown in (3.76). i } Kim John Sandy „ = 1 2 3 u u ~ u ~ = a (i), where might provide the assignment for : 8ƒƒ‚ Q j C i Q à (¤ % (3.76) [[you ]] Table 3.13: PP Adverbials with Pronominal Internal Argument # 1 14 2 5 6 4 PP Adverbial above me after that among them beyond that by now for me # 4 2 2 4 4 1 PP Adverbial for now from here from this in it in this like you # 8 28 8 5 2 1 PP Adverbial to me since then until recently until then with that with them Not all pronouns only refer to concrete objects, however. As Table 3.13 shows, there are numerous pronominal internal PP arguments found in our corpus that may be discourse deictic and refer to an abstract object interpretation. These are cases of demonstrative pronouns, inanimate third person pronouns it20 , or adverbs functioning as pronouns. Demonstrative reference to abstract objects was discussed in Chapter 2; we will discuss (discourse) deictic adverbs in Section 3.6. The semantic interpretation of these pronominals can be represented in the same way as other pronouns, e.g. as relative to a context-determined assignment function. The difference is that here the context may determine that C F denoting an entity is assigned to an abstract object interpretation of either an NP or a non-NP constituent. That these internal arguments can reify abstract objects from non-NP constituents is shown in (3.77)-(3.78), where both that and then refer to the event interpretation of the first sentence. Note the potential for ambiguity, however. In (3.79), then can refer either to the NP the morning or to the event of waking early. 20 Recall from Chapter 2 however that unless the predication on it is sufficiently semantically-enhanced ([Byr00]) (and this is not provided by a preposition), an abstract object it refers to must have already been referred to with an NP. 113
    • (3.77) I went to the movies. After that, I ran a hundred errands. (3.78) I went to the movies. Since then, I’ve run a hundred errands. (3.79) In the morning, I woke up early. Since then, I’ve run a hundred errands. There are also a number of cases in our corpus where the internal argument is a single noun modified by a possessive pronoun. Some examples are shown in Table 3.14. Again, we’ve provided cases in which the interpretation of the modified nouns is an abstract object. For example, natures, opinions, knowledge are abstract objects. Moreover, possessive pronouns are anaphoric or deictic, but they refer to concrete objects. Therefore, like possessive proper nouns, these NPs may, but need not, be coindexed with abstract objects in the prior discourse. Table 3.14: PP Adverbial with Possessive Pronoun # 1 1 1 3 1 3.5.2 PP Adverbial by its nature before his departure despite his yearning for his part in our case # 2 2 1 2 1 PP Adverbial in my opinion in my view to his surprise to my knowledge under his supervision Demonstrative and Definite Determiners Approximate counts in our corpus of types and tokens of the internal arguments discussed in this section are shown in Table 3.15. Table 3.15: Approximate Counts of Tokens and Types of some Internal PP arguments # Tokens 468 310 # Types 266 180 Internal Argument single nouns modified by definite article single nouns modified by demonstrative determiner Numerous instances of single nouns modified by the definite article (the) are found as the internal argument of the PP adverbials in our data set. In some cases, these nouns denote concrete objects; Table 3.16 provides some examples along with their corpus counts. In many cases, how114
    • ever, these nouns denote abstract objects. Table 3.17 provides some examples of definite abstract objects along with their counts. Here we see numerous novel examples of abstract objects, such as criticism, record, evidence. We also see that some nouns have one interpretation as a concrete object and another metaphorical interpretation as an AO, such as board. We also see a deictic noun, past (present also occurs in our corpus), which is always interpreted with respect to either the spatio-temporal context or the discourse time. Table 3.16: PP Adverbials with Definite Concrete Object Internal Argument # 1 3 1 1 1 1 PP Adverbial above the tongue at the door below the fort beyond the forest down the boulevard for the boy # 2 1 1 1 1 1 PP Adverbial in the city inside the courtroom near the coast since the hurricane toward the west within the individual Table 3.17: PP Adverbials with Definite AO Internal Argument # 1 2 2 6 2 11 17 2 1 2 4 PP Adverbial across the board after the payment after the split along the way at the close at the moment at the time by the way despite the criticism during the trial for the moment # 1 1 20 14 18 6 12 3 4 7 1 PP Adverbial for the record from the evidence in the end in the meantime in the past in the process on the contrary on the surface on the way under the agreement with the increase Semantically, definite nouns denote a specific known entity. However, definite nouns are not necessarily anaphoric to something salient in the prior discourse or spatio-temporal context. For example, in (3.80), the trial is inferable from the reified AO interpretation of the suing event. But in (3.81), the record simply refers to an abstract notion of right and wrong. 115
    • (3.80) John sued Mary for all she was worth. During the trial, he cried profusely. (3.81) For the record, graduate students don’t get paid enough. The defining semantic feature of definite descriptions is that they are used when only one entity corresponds to their description ([HK98]). Thus “the king of America” is infelicitous because there is no corresponding entity. [HK98] views this failure to refer as a presupposition failure, and represents definite determiners as partial functions whose domain contains only those nouns that correspond to one entity in the set of individuals, and whose range contains the denotations of those nouns. As noted, a definite noun may or may not be anaphoric, however, in the sense that it recovers an entity in the prior discourse. There are numerous instances of single nouns modified by demonstrative determiners found as the internal argument of the PP adverbials in our data set. In some cases, these nouns denote concrete objects; Table 3.18 provides some examples along with their counts. Table 3.18: PP Adverbials with Demonstrative Concrete Object Internal Argument # 1 2 1 1 1 PP Adverbial above these jobs at that price at these offices in these families in these organizations # 2 2 2 1 1 PP Adverbial in this article in this play of that amount on these generators to these people In the great majority of cases, however, these nouns denote abstract objects. Because many of these nouns appear with a variety of demonstratives, we have conflated them in Table 3.19 in order to present a wider variety. Here we see “basic” AOs such as events, situations, and facts. However, we also see backdrop and circumstances, which can be situations, study and service, which can be events, and basis and reason which can be facts. Demonstrative NPs, like demonstrative pronouns, are anaphoric or deictic and must be interpreted with respect to the discourse or spatio-temporal context. They can either be represented semantically akin to other anaphoric reference, e.g. via an assignment function, or they can be represented using partial functions, akin to definite descriptions. And as is also true for demonstrative 116
    • pronouns, the context may determine that they reify abstract object interpretations of VP or S. Examples are shown in (3.82)-(3.83), where that service and that reason refer to the AO interpretation of the first sentence. Again, however, the demonstrative NP’s referent may be a previously mentioned abstract object NP, as in (3.84), or it may be ambiguous, as in (3.85), where that reason can refer to the previously mentioned NP a reason, or it can refer to the AO interpretation of the first sentence. (3.82) John helped Mary wash the car. After that service, she paid him $40. (3.83) John couldn’t sleep. For that reason, he got out of bed. (3.84) Yesterday you gave me a good reason to move. For that reason, I thank you. (3.85) It wasn’t until yesterday that you told me your reason for leaving. For that reason, I am mad at you. Table 3.19: PP Adverbials with Demonstrative AO Internal Arguments # 1 2 1 21 2 1 2 1 13 1 1 1 4 5 25 3.5.3 PP Adverbial after that service against that/this backdrop along these lines at that/this point at this stage by that logic by that/these measure(s) by this standard by that/this time despite these challenges despite these facts during this study for that matter for that/this reason in that/this/these/those case(s) # 2 5 2 1 3 4 8 2 12 3 1 5 2 1 2 PP Adverbial in these circumstances in this connection in that event in that function in this instance in this manner in this/these respect(s) in this sense in that/this way on this basis outside those limits to that/this end under this plan with this situation within that/this framework Indefinite Articles, Generic and Plural Nouns, and Optional Arguments So far we have examined how explicit AO reference creates PP discourse adverbials. In this section we use single generic, plural and indefinite single nouns to demonstrate other semantic mechanisms 117
    • at work in the internal arguments of PP discourse adverbials. Approximate corpus counts of these PP internal arguments are shown in Table 3.20. This analysis also applies to some of the internal arguments discussed above; we have simply ignored this aspect until now. Table 3.20: Approximate Counts of Tokens and Types of some Internal PP arguments # Tokens 229 1020 # Types 68 233 Internal Argument single nouns modified by indefinite articles single generic and plural nouns There are numerous instances of single nouns modified by indefinite articles found as the internal argument of the PP adverbials in our data set. In some cases, these nouns denote concrete objects; Table 3.21 provides some examples along with their counts. Table 3.21: PP Adverbial with Indefinite Concrete Object Internal Argument # 1 1 3 1 PP Adverbial after a roundup as a boy as a group for an anthropologist # 1 1 1 1 PP Adverbial in a saucepan to a stranger under a microscope with a bellow In many other cases, these nouns denote abstract objects. Table 3.22 provides some examples along with their counts. Again we see numerous novel AOs, including rule, quirk, sense, etc. Table 3.22: PP Adverbial with Indefinite AO Internal Argument # 2 1 18 10 7 PP Adverbial as a rule by a quirk in a statement for a moment for a while # 1 4 4 1 1 PP Adverbial in a fashion in a sense in a way on an impulse to a degree The semantics of indefinites is the subject of much current research (c.f. [Roo95b, vdB96, HK98, Hei82]). In [HK98] indefinite articles are represented as total functions, e.g. they place no 118
    • requirements on the nouns in their domain. More generally, these indefinite nouns can be represented as unary predicates denoting an unspecified entity; in predicate logic this is represented as in (3.86). (3.86) [[an impulse]] = x.impulse(x) h The indefiniteness of these nouns, whether they denote concrete objects or abstract objects, does not cause them to resolve or refer to entities in the prior discourse. This not to say that certain of the adverbials in Table 3.22 are not sometimes treated as discourse connectives; we argue here only that their semantic interpretation does not require an abstract object in the prior discourse. We will return to this issue in Chapter 5. Not all indefinite nouns are unary predicates, however. Certain of the indefinite nouns in our corpus are relational nouns, which take a syntactically optional, or hidden, or implicit, argument that is anaphoric to some salient entity in the prior discourse or spatio-temporal context. Examples are shown along in the first column of Table 3.23. In the second column these arguments are made overt with a demonstrative. Table 3.23: PP Adverbial with Relational Indefinite AO Internal Argument # 2 2 1 84 1 PP Adverbial as an alternative as a consequence as a restatement as a result for an example Explicit Argument as an alternative to that as a consequence of that as a restatement of that as a result of that for an example of that We introduced syntactically optional arguments in Section 3.4, using examples such as (3.50). (3.50 a) The due date for the grant has passed. Mary didn’t apply for it. (3.50 b) The due date for the grant has passed. Mary didn’t apply . q (3.50 c) *Mary didn’t apply . q Intuitively, when one applies, they apply for something; in (3.50 a), this (for it) argument is overt and anaphoric to the grant in the prior sentence, while in (3.50 b), it is implicit but retrievable from the context. In (3.50 c) it is not syntactically optional because the context doesn’t supply it. 119
    • Similarly, if something is a result, consequence, or restatement, etc., it is a result, consequence, or restatement of some specific something. Thus, we can make this semantic argument overt, as in (3.87 a), or leave it implicit, as in (3.87 b), because it is retrievable from the context. However, the discourse is infelicitous in (3.87 c) because the context doesn’t supply it. (3.87 a) The due date for the grant has passed. As a result of that, Mary didn’t apply for it. (3.87 b) The due date for the grant has passed. As a result , Mary didn’t apply for it. q (3.87 c) *As a result, Mary didn’t apply for a grant. These nouns are thus binary predicates. Other nouns are also viewed as binary predicates, as discussed in Section 3.4, including part (of something) and mother (of somebody). In predicate logic, such binary predicate nouns can be represented as shown in (3.88). (3.88) [[result]] = x y[result(y,x)] h h What predicate logic cannot represent, however, is whether or not this hidden argument has to be resolved in the prior discourse. As discussed in Section 3.4, implicit arguments have been distinguished into two types: definite and indefinite. As exemplified above in (3.50), the implicit argument of apply is definite; it must be anaphoric to something in the prior context in order to be interpreted. In contrast, the implicit argument of mother is indefinite; in Section 3.4, (3.53) was used to exemplify indefinite hidden arguments, which are not necessarily anaphoric with anything; their interpretation can be independent of context. (3.53) Today, Mary talked to a mother. Although the verb donate in Section 3.4 (example 3.54) indicates that some implicit arguments can either be definite or indefinite depending on context, the implicit arguments of the indefinite nouns in Table 3.23 appear to always be definite; at least when found as the internal arguments of these PP adverbials, these indefinite nouns cannot be interpreted independently of context and thus the containing PP adverbials can never be discourse-initial. Moreover, many of these arguments, implicit or explicit, appear to be abstract objects. For example, a result is a result of a cause, which is an abstract object. A restatement is a restatement of a statement, which is also an abstract object. Recall however the discussion of AOs in Chapter 2, where we noted Vendler’s observation that some 120
    • concrete objects such as fire, blizzard can be interpreted as causes. Of course, it can be difficult to distinguish whether a particular case takes a hidden argument, or merely a potential adjunct, especially when this argument appears to be indefinite. This is true of many of the indefinite nouns shown in Table 3.22. For example a sense, found in the PP adverbial in a sense in Table 3.22, is likely interpreted as a sense of something. This something is clearly indefinite, however, like the somebody argument of mother, and so it does not cause the containing PP adverbial to be uninterpretable discourse-initially. More generally, we can use the tests (e.g. entailment and presupposition) discussed in Section 3.4 that have been used to distinguish verbal arguments and adjuncts, but these tests are not foolproof. As one might expect, the same variability in form we have seen displayed by the internal argument of our PP adverbials can be displayed by these syntactically optional arguments. For example, in Table 3.23, we made the argument explicit with a demonstrative, but this argument can also appear as a full noun phrase, and if the noun phrase is not anaphoric, as in (3.89), no prior context is needed to interpret the adverbial. (3.89) As a consequence of war, people die. Furthermore, the same variability in resolution we have seen displayed by the internal arguments of our PP adverbials can be displayed by these syntactically optional arguments. For example, in (3.90 a), the interpretation of the demonstrative resolves to the abstract object interpretation of the VP, e.g. giving him a book, but in (3.90 b), the interpretation of the demonstrative is ambiguous; it can resolve to giving him a book or to the NP a book. (3.90 a) If John is bored, give him a book. As an alternative to that, take him to the zoo. (3.90 b) If John is bored, give him a book. As an alternative to that, give him a magazine. There is a distinction, however, between these binary nouns that we have so far ignored. While the hidden arguments of result, consequence, restatement can resolve to single AO interpretations, the hidden argument of example must resolve to a set of interpretations. This property of example is noted in [WJSK03]. For example, in (3.91 a), this argument resolves to the set of consequences of John breaking his arm, as is made overt in (3.91 b). 121
    • (3.91 a) John just broke his arm. So, for an example, he can’t cycle to work now. (3.91 b) John just broke his arm. So, for an example of the consequences of breaking an arm, he can’t cycle to work now. This analysis readily extends itself to generic and plural nouns, which are quite frequently found in our corpus as internal arguments of PP adverbials. Again, these nouns may be concrete objects; Table 3.24 provides some examples, along with their counts. Table 3.24: PP Adverbials with Generic or Plural Concrete Object Internal Arguments # 1 2 1 2 PP Adverbial among professionals as artists below decks for corporations # 1 1 1 1 PP Adverbial in academia like lemmings to libertarians within institutions Our corpus also contains many examples of plural or generic nouns which denote abstract objects; Table 3.25 provides some examples along with their counts. Table 3.25: PP Adverbials with Generic or Plural AO Internal Arguments # 6 3 105 4 2 1 PP Adverbial at times by law in fact in practice in reality in theory # 1 1 1 1 2 2 PP Adverbial in emergencies in truth on occasion on reflection over time without question Like indefinites, the semantics of generic and plural nouns is the subject of current research (c.f. [Roo95b, vdB96, HK98]). Generally, these generic nouns can be represented either akin to indefinites, e.g. as unary predicates denoting an unspecified entity, or akin to plural nouns, which can be represented as unary predicates denoting an unspecified set of entities, as shown in (3.92)21 . (3.92) [[professionals]] = x.professional(x) e 21 The quantifier suggests iteration, which is not always the case. 122
    • Whether they denote concrete objects or abstract objects, the generic-ness and plurality of these nouns does not cause them to retrieve entities in the prior discourse. Again, however, certain of the adverbials in Table 3.22 (e.g. in fact, of course) are frequently viewed as discourse connectives; as stated above, we argue here only that their semantic interpretation does not require an abstract object in the prior discourse, and will return to this issue in Chapter 5. However, as we also found for indefinite nouns, certain generic and plural nouns are relational, and appear to take a syntactically optional, or hidden, or implicit, argument that is anaphoric to some salient entity in the prior discourse or spatio-temporal context. There are many examples of such generic nouns found as the internal arguments of PP adverbials in our corpus. Some examples are shown in Table 3.26. Table 3.26: PP Adverbials with Relational Generic AO Internal Arguments # 1 1 10 28 167 70 204 1 2 1 16 PP Adverbial as evidence by analogy by comparison by contrast for example for instance in addition in comparison in conclusion in consequence in contrast # 3 3 1 3 6 8 5 117 3 3 1 PP Adverbial in effect in essence in exchange in part in response in return in sum of course on average on balance on reflection Again, it can be difficult to distinguish whether a particular case takes a hidden argument, or merely a potential adjunct. It appears that all of the (assumed) hidden arguments in Table 3.26 can resolve to the AO interpretation of a VP or S in the prior discourse. For example, in (3.93), the implicit (or explicit demonstrative) argument resolves not to one of the entities John, movies, Mary, but to the action of John going to the movies with Mary. (3.93) John went to the movies with Mary. In exchange (for that), she gave him a back rub. However, not all of these arguments are clearly definite. As in Section 3.4, where we saw that 123
    • the definiteness of the optional argument of donate was to some extent dependent on context, so do we see a similar dependency for on average and in essence. For while an average is an average of (or over) something, and an essence is an essence of something, only in (3.94 a) and (3.95 a) are these “somethings” anaphoric to an AO interpretation in the prior context. In (3.94 b) and (3.95 b) these “somethings” are interpreted independently of context. (3.94 a) We choose the actors, we build the sets, and we keep the books. In essence, we run the show. (shortened WSJ example) (3.94 b) In essence, I am a happy person. (3.95 a) Mike washes the dishes. Mary dries them and puts them away. On average, they do about the same amount of work. (3.95 b) On average, John is a happy person. Furthermore, as we saw with indefinites, many of the syntactically optional arguments in Table 3.26 can resolve to a set, including the optional arguments of instance, part, sum, average, and essence. We will discuss sets in greater detail in Section 3.6 and Chapter 5. 3.5.4 PP and ADJP Modifiers So far we have only seen actual corpus examples of single noun internal arguments. Modifying these internal arguments with PPs and ADJPs can also create discourse adverbials. Table 3.27 presents approximate counts in our corpus where the internal PP argument is a single noun modified by a single PP or ADJ. Table 3.27: Approximate Counts of Tokens and Types of some Internal Argument Modifiers # Tokens 1491 1926 # Types 1400 1143 Modifier single noun with PP modifier single noun with ADJ modifier Clearly, PP modifiers of the internal arguments of PP adverbials can create discourse adverbials because PP modifiers themselves contain an internal argument, and it can be analyzed in the same way that we have already been analyzing the single noun internal arguments of PP adverbials. 124
    • As a particularly relevant example, in Tables (3.29)-(3.28), we provide corpus cases in which the syntactically optional semantic arguments of binary definite, indefinite, and plural or generic nouns, respectively, are made syntactically overt by a PP. Table 3.28: Binary Definite Internal Argument with Overt Argument # 1 1 1 1 1 1 1 1 1 1 PP Adverbial at the beginning of the hippodrome at the end of the day at the insistence of Arturo Toscanini at the outset of his career in the case of academic personnel in the midst of it all in the course of this on the basis of this careful reading since the start of the decade with the exception of satires of circumstance Table 3.29: Binary Indefinite Internal Argument with Overt Argument # 1 1 1 1 1 8 1 1 1 PP Adverbial as an evocation of time past as an example of this last facet as an illustration of the principle of simplicity as an indicator of the tight grain supply situation in the U.S. as an introduction to American politics as a matter of fact as a part of overall efforts to reduce spending as a result of that attitude in a series of fairy tales and fantasies The NPs within these syntactically optional PPs again display a variety of novel abstract objects, conveyed in a variety of forms. For example, in the above two tables we find demonstrative pronouns (e.g. this in in the course of this), demonstrative nouns (e.g. that attitude in as a result of that attitude), generic nouns (e.g. simplicity in the principle of simplicity in as an illustration of the principle of simplicity), etc. As already noted above in constructed examples, making the syntactically optional argument of a relational noun overt may move the burden (of classifying the adverbial 125
    • containing it as a discourse or clausal adverbial) to the interpretation of this overt argument. More generally, regardless of whether or not a PP modifier is a semantic argument of the noun it modifies, it introduces additional entities that may or may not refer to AOs in the prior discourse. Table 3.30: Binary Generic or Plural Internal Argument with Overt Argument # 5 1 1 1 1 1 1 1 1 1 1 4 1 1 3 1 2 PP Adverbial as part of the agreement by means of this social control in accordance with legislation passed at the last session of Congress in addition to free massages in anticipation of that shift in case of a deadlock between prison boards and inmates in connection with this conference in continuation of these theoretical studies in contrast to all this in exchange for higher price supports in light of all this in point of fact in reaction to proposed capital-gains legislation in response to this in spite of this in terms of volume on top of all this Adjectival modification of an internal argument can also cause it to be interpreted with respect to discourse or spatio-temporal context. The simplest cases in which to see this are PP adverbials whose internal argument is modified by an adjective whose interpretation depends on the spatiotemporal properties of either the discourse or the context. Some examples are shown Table 3.31, modifying a noun that can be interpreted as an abstract object. Table 3.31: Internal Argument with a Spatio-Temporal ADJ # 3 22 PP Adverbial under current rules in recent years # 1 1 PP Adverbial for present purposes under modern conditions There are many other different classes of adjectives in our corpus; adjectives are as varied (and 126
    • as difficult to impose classification on) as adverbs. One major functional class found in our corpus includes adjectives ([MW]) that can function as determiners, including distributives (all, both, either, each, every...), cardinal numbers (one, two...), other quantifiers (few, some, any...), and “difference words” ([gra]) or “alternative phrases” ([Bie01]) (another, other, such, same). Some distributives cause their associated noun to be interpreted with respect to an entity in the prior discourse. When this associated noun is an abstract object, it is interpreted with respect to an AO in the prior discourse. Some examples of PP adverbials containing such distributives and an associated abstract object are shown in Table 3.32. In general, nouns modified by distributives refer to groups or individuals in groups from in the prior discourse context ([gra]). [HK98] uses of partial function to represent these distributives semantically, akin to the definite article, but where a specified number or set of entities is denoted. Table 3.32: Internal Argument with Referential Adjective # 1 5 1 1 PP Adverbial by both standards in both cases in both respects in both conditions # 7 1 5 1 PP Adverbial in each case in each instance in either case in either event Other distributives, along with the cardinals and quantifiers noted above, do not elicit an entity in the prior context for their interpretation. Some examples from our corpus of PP adverbials containing these determiners and an associated abstract object are shown in Table 3.33. Because the NP they produce is non-referential, [HK98] uses total functions to represent these determiners, such that one and some NP denote an unspecified entity akin to indefinite nouns, and the remainder denote sets of unspecified entities, akin to plural nouns. Nevertheless, certain of the adverbials in these tables have been treated as discourse adverbials, including in any event. We will investigate possible causes of this in Chapter 5. Many of these determiners can function as noun modifiers; the first column of Table 3.34 shows NPs containing these modifiers and additional determiners. Of course, the semantics of the determiner must not contradict the semantics of the modifier (e.g. “few one hand”), but may effect its 127
    • referential properties, as in on the one hand, where the may cause one hand to refer to a situation (metaphoric hand) in the prior context. Leaving the modified argument implicit as shown in the second and third columns may also effect the referential properties of these adverbials; some and one require this argument to be resolved. We will discuss other analyses of after all in Chapter 5. Table 3.33: Internal Argument with Non-Referential Adjective # 7 21 1 1 2 1 1 1 7 1 2 PP Adverbial at one point for one thing in one case in one sense in one way in two cases on two occasions for many reasons in many cases in many instances in many ways # 1 1 4 2 1 3 2 5 4 4 1 PP Adverbial by all accounts by all means in all cases in all fairness in all probability by some estimates for some reason for some time in some cases to some extent in every period # 6 1 24 13 1 7 1 1 5 2 1 PP Adverbial at any rate by any measure in any case in any event in any instance in most cases by most accounts in several instances with few exceptions in certain respects under certain circumstances Table 3.34: Internal Argument with Determiner and Non-Referential Adjective # 1 1 7 8 PP Adverbial in a few instances in a certain sense for the most part on the one hand # 9 49 3 2 PP Adverbial above all after all in all in one # 1 1 1 2 PP Adverbial to some for those few at the most for one Ordinals and ordinal-like adjectives are also found in our corpus, with or without determiners, as shown in Table 3.35. As shown in the third column, their argument can be implicit. We will discuss ordinals further in Section 3.6. “Difference words”, or “alternative phrases” also cause their associated noun to be interpreted with respect to an entity in the prior discourse. When this associated noun is an abstract object, it thus will be interpreted with respect to an AO interpretation in the prior discourse. Some examples of PP adverbials containing alternative phrases and an abstract object are shown in Table 3.36. Another and other invoke something “different, remaining, or additional”, thus they are called 128
    • Table 3.35: Internal Argument with Ordinal Adjective # 1 6 2 PP Adverbial in a first step for the first time in the first instance # 1 1 1 PP Adverbial on second thought at last report in the second place # 17 16 3 PP Adverbial at last at first in the second Table 3.36: Internal Argument with Alternative Phrase # 1 1 1 1 PP Adverbial in another approach in another case in another respect in another sense # 20 24 2 1 PP Adverbial among other things in other words on other matters among other issues # 1 2 2 1 PP Adverbial in such cases in such circumstances in such situations on such occasions “difference words” in [gra]; other and such also create what are called alternative phrases in [Bie01]. As determiners, another is used with singular, and other with plural nouns, but they also function as modifiers; the first column of Table 3.37 shows NPs containing these modifiers and additional determiners. The second column presents another anaphoric “difference” adjective, same, that requires a determiner. The third column shows that the modified noun can sometimes be implicit. Table 3.37: Internal Argument with Determiner and Alternative Phrase # 49 1 PP Adverbial on the other hand in many other instances # 71 2 PP Adverbial at the same time in the same way # 1 2 PP Adverbial for another on the other [WJSK03, Bie01, Mod01] discuss the semantic interpretation or resolution of other, such, and other alternative phrases in detail. They views the form other X as a lexical anaphor which refers to the result of excluding an entity or set of entities from a contextually relevant set, and presupposes that the excluded entity or entities belong to that set. For example, many other instances refers to the set of instances that result from excluding one (or more) instance in the discourse context from 129
    • a larger presupposed set of instances. The form another X (e.g. an other X) can be treated similarly; for example, another approach refers to the approach that result from excluding one (or more) approaches in the discourse context from a larger presupposed set of approaches. [Bie01] treats the form such X as including, rather than excluding, members of a set. In his terms, for example, such situations refers to the set that results from using one (or more) situations in the discourse context as an example of a presupposed set of situations that also includes the set referred to by the such phrase. The form the same X appears to be a direct reference, akin to that X, although there is some subtlety afoot in that a new entity can be introduced via the determiner, albeit identical in all respects to the original. [Bie01]’s analysis also incorporates the non-anaphoric counterparts (e.g. X’s other than/such as/the same as Y) of these phrases. These counterparts do not appear in our data, and sound relatively awkward as internal arguments of PPs (e.g. in many other instances than this). Bierner, following [McC88], treats these optional PP phrases which instantiate the excluded or included element as adjuncts rather than arguments([Bie01], pg. 28), and uses the AI planning heuristic, “use existing objects” [Sac77] to resolve them. In both Bierner’s and our terms, the excluded or included anaphoric element is represented as a hidden argument. Its interpretation may be partially determined by the modified noun (e.g. it’s an instance in other instances; however, the interpretation of cases such as on the other is wholly dependent on context, because the modified noun is implicit. Bierner’s analysis also extends to other adjectives encountered in our corpus. These are the comparative and superlative adjectives, which by definition are dependent in an idiosyncratic way on reference to at least one other object that may be found in or inferable from in the prior discourse or spatio-temporal context (c.f. [Bie01, WJSK03]) . Some examples of abstract objects modified by these adjectives are shown in Table 3.38. Table 3.38: Internal Argument with Comparative/Superlative Adjective # 1 1 1 PP Adverbial to better purpose in broadest terms in earlier reshufflings # 1 1 1 PP Adverbial with greater precision with minor exceptions on a deeper plane 130 # 1 2 1 PP Adverbial on further reflection in the short run for smaller newspapers
    • There are similar cases found in our corpus where the modified element is left implicit and there is often no determiner. Some example are shown in Table 3.39; these will be discussed further in Section 3.6. Table 3.39: Internal Argument with Other Set-Evoking Adjectives # 18 17 3 PP Adverbial in general in particular at best # 1 2 1 PP Adverbial in the main on the whole at worst # 41 11 13 PP Adverbial at least as usual in short While there is a much more varied array of adjectives displayed in our corpus than there is space to cover here, major classes have been discussed which cause abstract objects to refer to the prior discourse. Many of the other adjectives in our corpus (in addition to some of those discussed above), fall to a greater or lesser extent into the adverb modification types discussed in Section 3.3. 3.5.5 Other Arguments S and PPs are also found as internal arguments of the PP adverbials in our corpus. Moreover, in addition to internal argument modifiers, there are also adverb modifiers of the P head itself. The remainder of our corpus consists of complex combinations of the internal arguments already mentioned. Corpus counts of these types are shown in Table 3.40. Table 3.40: Approximate Counts of Tokens and Types of some Internal PP arguments # Tokens 692 192 166 2526 # Types 679 180 166 2361 Internal Argument S arguments PP arguments ADVP modifiers complex combinations For our purposes, the analysis of PP internal arguments is identical to the analysis of the PP modifiers described above, with respect to abstract objects and distinguishing discourse and clausal adverbials. However, because annotators frequently confuse PP adverbials containing PP internal 131
    • arguments with ADVP adverbials, we will investigate some particular cases in the next section. Annotators also frequently confuse PP adverbials containing adverb-modified P heads with ADVP adverbials; these too will be discussed in the next section. Also for our purposes, S-modifying phrases taking S internal arguments simply provide another form in which two AO interpretations are related. However, based on the Penn Treebank I POS tagging and bracketing guidelines, all PP adverbials taking S arguments should not have been present in our data. For though Penn Treebank POS tags make no distinction between prepositions and subordinating conjunctions (both are tagged as IN), the argument to the latter is a clause and the entire phrase is bracketed as a subordinating clause[PT]. However, an explicit list of subordinating conjunctions is not provided. In [Lit98], a subordinating conjunction is defined as a linguistic form that makes a clause a constituent of another clause, and the list of subordinating conjunctions is viewed as a research question; [Lit98] identifies 107 main entries and 374 senses for subordinating conjunctions in a Webster’s dictionary, including prepositional, adverb, and verb phrases. Roughly half of our S arguments contained a verb participle and lacked a subject, and were parsed as reduced clauses. Some shorter examples are shown in Table 3.41. Table 3.41: PP Adverbial with Reduced Clause Internal Argument # 1 1 2 1 2 PP after beating them as already noted as expected as might be expected as previously reported # 1 1 3 1 1 PP without saying so for winning in doing so in so doing on delving deeper These cases are interesting because it can be difficult to decide whether to treat some phrase as reduced clauses or sub-clausal constituents, especially when modified by a preposition that also functions as a subordinating conjunction. This is particularly an issue for arguments lacking both a verb and a subject, such as “while in hiding” or “as usual”. Moreover, some lexical items, such as as, can be both prepositions and adverbs (e.g. twice as long). Adverbs can modify verbs, and thus we could represent phrases such as as expected either as subordinating clauses or as ADVP 132
    • adverbials. [Kno96], in fact, treats this expression as an adverbial discourse connective without considering its composition. He does not, therefore, address the closely-related PP, as might be expected, or its numerous other possible variations. The remaining PP adverbials in our corpus constitute cases that combine the modification and argument types discussed above, including multiple modifications, coordinated phrases, and relative clauses, etc. And again, their semantic function as discourse or clausal adverbials can be determined according to their composition, as discussed above. 3.5.6 Summary Table 3.42: PP Adverbial Summary Internal Argument Type proper noun animate pronoun potential DD definite AO demonstrative AO indefinite AO with hidden AO argument indefinite AO with hidden AO argument definite AO with hidden AO argument PP Adverbial in New York to me after that at the time for that reason as a result (of that) † as an example (of that) † at the very least … … In this section saw that the majority of PP discourse adverbials do not occur frequently. Rather, there are a few “stock” discourse connectives, such as as a result, and a wide variety of other PP adverbials which contain a semantic argument that must be resolved to an AO interpretation in the prior discourse or spatio-temporal context. We have presented a wide variety of abstract objects found in our corpus of PP adverbials. We have shown that PP clausal and discourse adverbials can be distinguished by the interpretation the semantic arguments they contain. We have presented a variety of semantic mechanisms that can cause these arguments to be interpreted with respect to the discourse or spatio-temporal context. Ignoring the open question of small clause internal arguments, a summary of some basic mechanisms which can cause these arguments to reference an 133
    • abstract object in the surrounding context are shown in Table 3.42. 3.6 S-Modifying ADVP Adverbials As discussed in Section 3.2, and represented in Figure 3.6, S-modifying ADVP adverbials adjoin to an external S argument and are frequently composed of a single adverb head, but may additionally contain an internal argument and one or more specifiers. S ÍÍ Í ÎÎ Í Î Î ADVP ADV Arg 2 BT% ‘ ‘ ‘ ’ ‘ ÝÝ ’ ’ ’ Spec S Figure 3.6: Syntactic Structure of S-Modifying ADVP Adverbials In prior sections we saw [Moo93]’s use of predicate logic to represent S-modifying ADVP that take only one semantic argument; here we illustrate the interpretation of such ADVP in lambda calculus (c.f.[HK98]) as shown in (3.96), using the ADJ (or NP) derivative of the adverb to represent the property adjective they supply to their (external) semantic argument . For example, if wise is › taken to represent the property supplied by the adverb wisely, then the interpretation of the ADVP › wisely can be represented as in (3.97). We resolve to the semantic interpretation of the external S argument. If engage(j,s) is taken to represent the semantic interpretation of the sentence John engaged the safety, then the interpretation of the S-adjoined ADVP adverbial is as in (3.98). (3.96) [[ADVP]] = y.adjective(y) ß (3.97) [[wisely]] = y.wise(y) ß (3.98) [[Wisely, John engaged the safety]] = wise(engage(j,s)) If we exchange wise for additional, however, we produce the interpretation of the discourse adverbial additionally. In the remainder of this section we focus on how the semantic mechanisms underlying the interpretations of the adverbs in our corpus distinguish ADVP clausal and discourse adverbials. We will show that many S-modifying ADVP adverbials take an additional semantic 134
    • argument and can be represented as binary predicates, and we will also see more complex semantic representations for certain adverbs. However, the ADVP adverbials in our data set are a motley crew, due to the wide variability of adverbs. For example, while many adverbs constitute an entire ADVP (e.g. carefully), a large number of ADVP adverbials in our corpus contain adverbs modified by another adverb (e.g. more carefully) that can also affect their interpretation. Moreover, we find frequent tagging errors in our ADVP corpus which serve to illuminate the interpretations of adverbs, as they display patterns also common to correctly tagged ADVP adverbials. 3.6.1 Syntactically Optional Arguments Approximate counts in our corpus of the tokens and types of the ADVP adverbials discussed in this section are shown in Table 3.43. Table 3.43: Approximate Counts of Tokens and Types of some ADVP Adverbials # of Tokens 85 78 # of Types 55 18 Category PP-like ADVP with optional arguments relational ADVP with optional arguments There are a number of constructions in our ADVP corpus that should have been (and in our PP corpus have been) tagged as PP adverbials. Some examples are shown in Table 3.44. The first column lists the initial preposition-preposition construction (COMB); when the second preposition varies, we list it as P. The second and third columns indicate the number of times this combination was tagged as PP and ADVP, respectively. The last column provides a corpus example. Table 3.44: Mis-Tagged PP Adverbials COMB as P because of except for prior to # PP 94 12 12 4 # ADVP 3 54 1 6 Example as for the merits because of this except for the embarrassment prior to that The as P construction is likely mistagged as an ADVP due to the use of the adverb as in other 135
    • constructions discussed below. The because of and except for constructions may be mistagged as ADVP adverbials due to their similarity in form to the ADVP construction shown in Table 3.45, in which an ADVP adverbial takes an optional PP. These ADVP adverbials were also mistagged as PP adverbials, as shown. Table 3.45: PP-like ADVP Adverbials with Overt Arguments COMB instead of rather than regardless of # PP 7 2 0 # ADVP 37 11 5 Example instead of that rather than tempt people to buy more regardless of rights and wrongs However, as shown in Table 3.46, and unlike the PP internal arguments of the PP adverbials shown in Table 3.44, the PP of these ADVP adverbials are syntactically optional. Although as shown regardless doesn’t occur in our corpus (sentence-initially) without its accompanying PP, this author recently heard such a use on National Public Radio. Table 3.46: PP-like ADVP Adverbials with Hidden Argument # 18 14 0 ADVP Adverbial instead rather regardless These ADVP adverbials can be represented as binary predicates, akin to PP adverbials. However, recall that [Bie01, McC88] do not treat the optional than PP in other than phrases as a complement, but rather as an adjunct, and [Bie01] then uses the AI heuristic “use existing objects” to resolve the hidden argument to the internal argument of the PP. While the representation of this PP may vary depending on the chosen formalism, we nevertheless argue that these ADVP adverbials take a definite hidden semantic argument. Moreover, the interpretation and/or resolution of this hidden argument can be an concrete object, an abstract object NP, or an abstract object reified from a VP or S in the prior discourse. (3.99)- (3.101) show that although the argument can be explicit or implicit (a-b cases) and still resolve to the AO interpretation of VP or S, the modified sentence 136
    • cannot be interpreted if the argument is not resolvable in the discourse or context ( cases). i (3.99 a) I wanted to go to the movies. Instead of that, I went to work. (3.99 b) I wanted to go to the movies. Instead, I went to work. (3.99 c) *Instead, I went to work. (3.100 a) Michael won’t study biology. Rather than that, he dreams about it. (3.100 b) Michael won’t study biology. Rather, he dreams about it. (3.100 c) *Rather, Michael dreams about biology. (3.101 a) Mary might want to come by today. Regardless of that, I’m going to the movies. (3.101 b) Mary might want to come by today. Regardless, I’m going to the movies. (3.101 c) *Regardless, I’m going to the movies. Regarding the prior to construction shown in Table 3.44, there are other similar constructions tagged as ADVP adverbials in our ADVP corpus, shown along with their counts in Table 3.47. All of these constructions contain relational adjectives with overt PP arguments. [MW] lists all five constructions as (complex) prepositional phrases. Only prior to is contained in our PP adverbial corpus, however while all five were contained in our ADVP adverbial corpus, albeit only once. Table 3.47: Relational ADJP with Overt Argument # 1 1 1 1 1 Relational ADJP contrary to these expectations due to the earthquake in San Francisco irrespective of the outcome in centuries elapsed since splitting short of fleeing to Warrenton, Virginia or Rockville, Maryland subject to certain constitutional restraints in favor of fair trials As discussed in Section 3.2, many adverbs are derived from adjectives or nouns; some ADVP discourse adverbials, not surprisingly, contain adverbs derived from relational adjectives or nouns. Some corpus examples are shown in Table 3.48. The third column shows the (or one of the possible) relational noun or adjective from which each adverb is derived, along with a demonstrative instantiation of its argument. Most adverbs derived from relational nouns or adverbs sound awkward with this argument made explicit, and thus don’t appear in our corpus. Simultaneously however did 137
    • appear in the corpus with an overt PP argument, as shown (partially) in the third column. Many of these relational nouns and adjectives also have corresponding PP adverbials (e.g. in addition to this). Table 3.48: Relational ADVP Adverbials with Hidden Argument # 12 3 1 1 8 5 1 3 6 12 2 2 12 8 1 Relational ADVP accordingly additionally alternately analogously consequently conversely contrarily currently incidentally recently previously separately similarly simultaneously subsequently Relational ADJP/NP Derivative according to this additional to this an alternate to this analogous to this a consequence of this a converse to this contrary to this current with this incidental to this recent to this previous to this separate from this similar to this simultaneously with the anode surface temperature.... subsequent to this We saw in the prior section that making the optional argument of a relational noun explicit can move the burden (of classifying the adverbial as a discourse or clausal adverbial) to this explicit argument, because it can take a variety of forms that may or may not depend on context for their interpretation, and may or may not be an abstract object. The same variety can be displayed by the explicit optional arguments of these relational adjectives. Interestingly, simultaneously can find both arguments within the clause it modifies, as in (3.102 a), where the events of Mary hearing the noise and Mary’s husband hearing the noise are interpreted as simultaneous. The coordinating conjunction, which also asserts the two events semantically and takes both arguments structurally,   may play a role here; see [CFM 02] for discussion. The same use of simultaneously can also resolve its hidden argument to the prior discourse, as in (3.102 a), where Fred seeing a bright flash can be interpreted as the event that is simultaneous with the event(s) of Mary and her husband hearing a noise. Discussion of clause-internal resolution can also be found in [WJSK03] and Chapter 4. 138
    • (3.102 a) Simultaneously, Mary and her husband heard a noise. (3.102 a) Fred saw a bright flash of light. Simultaneously, Mary and her husband heard a noise. 3.6.2 Context-Dependent ADVP Adverbials Approximate counts in our corpus of the tokens and types of the ADVP adverbials discussed in this section are shown in Table 3.49. Table 3.49: Approximate Counts of Tokens and Types of some ADVP Adverbials # Tokens 687 919 # Types 134 26 Category PP-Related ADVP Deictic ADVP Just as we saw in our PP corpus, in our ADVP corpus too there are a few cases,exemplified in (3.103), where a subordinating conjunction containing an internal S argument and modifying an S is misparsed as an ADVP adverbial. In many other cases found in our ADVP corpus, however, such a conjunction appears sentence-initially and the modified S is its internal S argument. The other conjoined clause is to be found in the prior discourse. Counts of such cases are shown in Table 3.50. (3.103) Once you get the feel of it Table 3.50: ADVP Adverbial Conjunctions # 2 212 88 ADVP Conjunction although however so # 15 1 ADVP Conjunction though unless These cases are generally viewed as adverbials (and thus may be correctly tagged, although Penn Treebank does not explicitly indicate their treatment), because one argument is not structural (with respect to the sentence structure). DLTAG currently treats the adverbial however (and likely the others as well) as an anaphoric discourse connective. The form of these adverbials cannot be decomposed without a sincere historical analysis; they nevertheless clearly require information in the 139
    • prior discourse for their interpretation. Only with however, as shown in (3.104), does the meaning changes perceptibly between the two forms. Its meaning as a subordinating conjunction (3.104 a) conveys a “manner”, while its meaning as an adverbial (3.104 b) conveys denial of expectation (c.f. [Kno96])22 . There may also be a difference in the temporal ordering between the two; the event modified by adverbial however is interpreted as occurring after the the event described in the first sentence, while the event modified by conjunctive however is interpreted as occurring before the event described in the first sentence. (3.104 a) John cannot seem to learn to tie his shoes, however (much) he tries. (3.104 b) John cannot seem to learn to tie his shoes. However, he tries. Also as we saw in our PP corpus, many subordinating conjunctions function as PP adverbials when their internal argument is a sub-clausal constituent. Twelve such cases are misparsed as ADVP adverbials in our ADVP corpus; in five cases there is a preceding adverb modifying the preposition head. An example is shown in (3.105) (3.105) Shortly after the Vale incident However, three lexical items that function as subordinating conjunctions and prepositions are also found in our corpus functioning as adverbs, in that they appear without an internal argument and in [MW] are treated as adverbs (when modifying VP or S). For our purposes, these adverbials contain definite semantic arguments that must be resolved to information in the prior discourse or context. In our corpus these cases occur six times, always with a specifier. Three examples are shown in (3.106)-(3.108). Moreover, the preposition after also has an synonymous adverbial form in which the internal argument is hidden, shown in (3.109), which occurs in our corpus four times and whose hidden argument is, like the others, anaphoric to information in the discourse or context. (3.106) Twenty years before (3.107) Shortly after (3.108) Ever since (3.109) Afterwards 22 It would be interesting to trace the development of these two distinct interpretations. 140
    • Similarly, there are lexical items that don’t function as subordinating conjunctions but do function as prepositions, and that are found in our corpus functioning as adverbs, in that they too appear without an overt internal argument. Again, this information must still be resolved in the prior discourse or context. One item, shown in (3.110), appears twenty-four times. There are numerous other examples that occur once or twice, including below, throughout, outside23 , although the last is ambiguously a noun24 . (3.110) Besides Although syntactically, prepositions that modify a VP or S may be treated as adverbs when their syntactic argument is missing25 , at the semantic level there is no reason to change their representation; in the latter case their hidden argument simply doesn’t come structurally. [Bie01] provides a semantic analysis of “Xs besides Y” (e.g. other search engines besides BidFind) as an alternative phrase that is similar to his treatment of the phrase “other Xs than Y”; other and besides in these constructions exclude X (the ground) from Y (the figure) and presuppose that both X and Y belong to the same set of alternatives. However, while Bierner considers the anaphoric form, “other Xs”, where Y is resolved anaphorically, he only considers the structural form of besides, not cases where Y is implicit, because he only studied sub-clausal modification, and this does not occur when besides modifies sub-clausal constituents. It does occur (twenty-four times in our corpus), however, when besides modifies S. An example from WSJ is shown in (3.111). (3.111) Mr. Lang says he isn’t scouting new acquisitions, at least for now. “We would have to go outside to banks to get the money and I am not ready to do that,” he said. “Besides, we have enough on our plate.” Y, the internal argument, could have been explicit, e.g. besides that. Bierner’s semantics may apply to S-modifying besides; in this case, however, the figure would be an AO from the discourse that is excluded from a larger presupposed set of AOs, rather than an concrete object. For example, in (3.111), the use of besides appears to invoke a set of reasons why Mr. Lang isn’t scouting new 23 see [Par84] for a discussion of such “context-bound” modifiers. A few (frequently mistagged as ADVP) nouns can modify S, as discussed in Section 3.2, (e.g. this year). 25 and it is not clear from Penn Treebank or [MW] that this is always the case 24 141
    • acquisitions, including Y from the context (having to go outside to banks), and the modified X (having enough on our plate). Extending Bierner’s semantics to S-modifying besides requires an annotation study, however, such as that described in Chapter 4. Frequently, misparsed PP adverbials found in our ADVP corpus are those in which a particular preposition head is modified by a particular adverb26 . Table 3.51 exemplifies such constructions. The first column lists the phrase-initial adverb-preposition combination (COMB); when the second preposition in the combination varies, we list it as P. The second and third columns indicate the number of times this combination was tagged as a PP and an ADVP, respectively. The last column provides an example of the construction as found in the corpus. Table 3.51: Mis-Tagged PP Adverbial Constructions COMB away from along with apart from aside from back P early in elsewhere in far P PP 0 9 3 10 4 7 1 1 ADVP 2 3 8 8 9 7 2 6 Example away from the general obligation sector along with the note apart from racial problems aside from this back in the U.S.A. early in her life elsewhere in the oil sector far from being minimalist In [MW], some of these constructions are listed as (complex) prepositional phrases. However most PP-modifying adverbs are also found modifying adverbs and nouns in our corpus, as shown in (3.112)-(3.114); far is specified by the anaphoric so, creating an oft-used phrase. (3.112) Back then (3.113) Immediately thereafter (3.114) so far this month More interesting for our purposes, however, is the fact that many of these modifying adverbs can also occur alone or with only a specifier, in which case they are interpreted as spatio-temporal 26 along also functions as a preposition, so this could be a PP adverbial containing an internal PP argument; however, it is similar in meaning to the others included in this table. 142
    • anaphors or deixis, or equivalently as having a hidden argument that is interpreted despite the fact that it is not overt, as exemplified in Table 3.52. One similar adverb, ago, can’t modify a preposition, only an NP; moreover, it requires a specifier. It appeared 132 times in our corpus with 71 different specifiers. NP modification and some more common specifiers are shown in Table 3.53. Table 3.52: Spatio-Temporal ADVP Adverbials # 1 1 2 2 15 ADVP Adverbial a scant half mile away a couple of years back elsewhere shortly soon # 3 2 2 6 32 ADVP Adverbial almost immediately immediately pretty soon thus far so far Table 3.53: Another Spatio-Temporal ADVP Adverbial # 1 1 13 ADVP Adverbial a year ago this fall seventeen years ago today a year ago # 12 8 7 ADVP Adverbial two years ago two weeks ago a few years ago These ADVP, along with the other temporal S-modifiers in Table 3.52 and those in Table 3.54 which are not also used as modifiers, supply temporal information to the modified clause they modify in relation to some other temporal information. Table 3.54: Other Spatio-Temporal ADVP Adverbials # 14 24 ADVP Adverbial already meanwhile # 10 7 ADVP Adverbial presently eventually This other temporal information can be resolved anaphorically, as in (3.115), where immediately refers to the time after the store filed for bankruptcy, or resolved deictically, as in (3.116) (from WSJ), where immediately refers to the time right after the time the text was read. For example, if 143
    • this discourse were read in today’s paper, then immediately would refer to just about now. (3.115) The store filed for bankruptcy. Immediately customers flocked to its closing sales. (3.116) This is but one knot in a string of troubles. Last year the store filed twice for bankruptcy. Immediately it will be subject to foreclosure. Similar resolution possibilities are found for modifiers in our corpus, exemplified in Table 3.55, that are usually viewed as “manner” (VP) modifiers, though they too supply temporal features. Table 3.55: Spatio-Temporal Manner ADVP Adverbials # 1 2 7 1 ADVP Adverbial abruptly briefly gradually hastily # 1 1 6 23 ADVP Adverbial instantly quickly slowly suddenly Adopting Ernst’s distinction between situation (S) modification and event (VP) modification discussed in Section 3.4, we find that these modifiers can either relate the two temporal boundaries of the event, or can relate the initial temporal boundary of the situation and a time in the discourse or context For example, in (3.117), quickly can either measures the time between raising the eye dropper and blinking, or the time between the start and finish of blinking. Briefly, moreover, can relate the two temporal boundaries of a speech act, as shown in (3.36), repeated from Section 3.3. (3.117) The doctor raised the eye dropper. Quickly John blinked. (3.36 b) (You asked me what John said.) Briefly, John said he would stop by. We have been addressing adverbs that have an anaphoric or deictic quality with respect to the fact that they must be interpreted with respect to the discourse27 , equivalently viewed as a hidden semantic argument. True deixis, however, is also frequently found in our corpus, as shown in Table 3.56 along with their counts28 . There are also cases in our corpus in which these deictics are mistagged PP modifiers (e.g. here in Morgenzon...), or are themselves modified (e.g. right now). 27 28 see [Par84] for further discussion Penn Treebank tags today, tomorrow etc. as NPs, although[MW] views them as adverbs. 144
    • Moreover, from many of these deictics (e.g. hence, then, still, yet, thus) have been derived homonymous relational deixis (discourse adverbials); most of the occurrences in our corpus of these homonyms likely correspond to their discourse adverbial use. The difference between these uses is illustrated using then in (3.118)29 . In (3.118 b), then orders the event it modifies in a temporal sequence relation with the event described in (3.118 a). In (3.118 c), then makes discourse deictic reference to the event described in (3.118 a) and the it-cleft asserts that the temporal coordinates of this event are the same as the temporal coordinates of the event described in (3.118 c). (3.118 a) John and Mary had dinner. (3.118 b) Then he asked her to marry him. (3.118 c) It was then that he asked her to marry him. Table 3.56: Deictic ADVP Adverbials # 26 59 189 292 39 ADVP Adverbial hence here now then there # 41 114 3 1 80 ADVP Adverbial still thus today tomorrow yet From many of these deictics still other other deictics and relational adverbs have been derived as well, as discussed in [Kno96]. Those found in our corpus are shown in Table 3.57. Table 3.57: Deictic-Derived ADVP Adverbials # 1 1 1 1 ADVP Adverbial hereby heretofore nowadays thenceforth # 4 1 47 ADVP Adverbial thereafter thereby therefore # 1 1 1 ADVP Adverbial therefore therein thereupon [Kno96, WJSK03] provide detailed discussions of the lexical semantics of the discourse adverbials in these tables. That many have retained their deictic quality is exemplified in (3.118 b), 29 This distinction was originally pointed out by Dr. Ellen Prince to the DLTAG group 145
    • where the discourse adverbial can also retrieve the related AO from the spatio-temporal context. For example, suppose two parents are watching a child playing in mud at the playground. One parent can make the comment in (3.119), retrieving this AO from the visual context. (3.119) “Then/Therefore he’ll come running over here and put his hands all over me.” 3.6.3 Comparative ADVP Approximate counts in our corpus of the tokens and types of the ADVP adverbials discussed in this section are shown in Table 3.58. Table 3.58: Approximate Counts of Tokens and Types of some ADVP Adverbials # of Tokens 384 77 # of Types 110 77 Category Atomic Comparatives Comparative Constructions Numerous ADVP adverbials in our corpus are comparative or superlative, and thus by definition dependent in an idiosyncratic way on reference to at least one other object in the prior discourse or spatio-temporal context (c.f. [Bie01, WJSK03]). In some cases, a comparison is made between abstract objects. Most of the comparatives adverbs in our corpus are temporal comparatives. Frequently, they appear modifying an S-modifier; the modified head determines what properties of the objects are being compared (e.g. equally often), or may clarify the source of the comparison (e.g. earlier (than that) this year). Some examples are shown in Table 3.59. Table 3.59: Comparative Adverb Modifiers # 15 1 1 ADVP Adverbial earlier this year farther south equally often # 1 10 5 ADVP Adverbial later that day more recently most recently Some comparatives (e.g. farther) don’t appear at all as S-modifiers; others do, exemplified in Table 3.60. In a few instances the element whose properties are being compared with those of the 146
    • modified S is made overt with a PP, as shown in the second column. The corpus example where more appear alone is shown in (3.120); it compares two AOs, one is reified from the interpretation of a clause in the prior context, and appears synonymous with moreover, discussed below. (3.120) Within the Organization of American States, there may be some criticism of this unilateral American intervention which was not without risk obviously. But there was no complaint from the Dominican crowds which lined Ciudad Trujillo’s waterfront shouting, “Vive Yankees”! More, the U.S. action was hailed by a principal opposition leader, Dr. Juan Bosch, as having saved “many lives and many troubles in the near future”. (Brown) Table 3.60: Comparative ADVP Adverbials # 2 30 13 1 ADVP Adverbial earlier later further more # 1 0 0 2 ADVP Adverbial even earlier than that later than that further than that more than ever More frequently in our corpus, comparative ADVP S-modifiers occur with a variety of specifiers that clarify the extent of the comparison. Such cases occur 111 times; in all but six the adverb head is earlier or later. Some examples are shown in Table 3.61. Table 3.61: Specified Comparative ADVP Adverbials # 16 5 ADVP Adverbial a year earlier once more # 1 7 ADVP Adverbial much better a year later There are also adverbs in our corpus that cannot modify other modifiers, but that may have been derived from comparatives, as shown in Table 3.62. Whether their internal comparative accounts for their behavioral anaphoricity is an open question. It is nevertheless clear that these S-modifiers require an AO from the context for their interpretation, and in some cases we can make their hidden argument overt (e.g. otherwise than that). The lexical semantics of some of these discourse adverbials are discussed in [WJSK03, WJSK99, Kno96]. 147
    • Table 3.62: Comparative-Derived ADVP Adverbials # 31 4 ADVP Adverbial furthermore likewise # 53 36 ADVP Adverbial moreover nevertheless # 3 19 ADVP Adverbial nonetheless otherwise Also in are corpus are various instantiations of a comparative construction of the form “as t adverb as”, where the inner as PP takes an S, NP, or other sub-clausal element as argument. Û 50 occurrences of this construction are found in our ADVP corpus; 2 are found in our PP corpus. Some instantiated forms are shown in Table 3.63, along with their (ADVP) corpus counts and a corpus example. There are also five related constructions exemplified in the bottom section of the table. “Just as” and “insofar as” also appeared in our PP corpus. Table 3.63: Comparative Constructions # 1 6 8 10 12 1 4 9 3 3 12 7 Construction as cheerfully as as early as as far as as long as as soon as as surely as as well as so far as insofar as so long as just as no matter WH Examples as cheerfully as possible as early as 1776 as far as i am concerned as long as this point of view prevails as soon as the tremor passed as surely as a seesaw tilts as well as in theme so far as i know insofar as science generates any fear so long as death was not violent just as in the case of every prodigy child no matter how hot the day In all cases, the phrase preceding the internal as (or WH) phrase functions as an ADVP; there is some ambiguity however as to whether the ADVP takes an internal argument or the internal phrase takes an ADVP modifier; the category of these constructions varies in the literature. In adverb t Û Penn Treebank, the “as as” constructions are tagged as ADJP when they function as sub-clausal modifiers, as in (3.121), and as NP when they function as arguments to transitive verbs, 148
    • as in (3.122). (3.121) Mary is as tall as John (is tall). (3.122)They yielded as much as 20% of the expected amount. However, [MW, Lit98, PT] all treat as well as as a coordinating conjunction, and [Lit98] also adverb t Û treats the no matter WH construction along with some of the “as as” constructions as subordinating conjunctions when the inner phrase is an S. As mentioned in Section 3.5, there is also some question as to whether all arguments of the inner as are not best represented as small clauses. For example, in the same way that we add “is tall” in (3.121) to interpret the comparison of Mary’s and John’s tallness, so we add additional material to (3.123) to interpret the comparison between Michael’s cheerfulness and the possibility of cheerfulness. (3.123) As cheerfully as (it is) possible (to enter the room), Michael entered the room. As well appears once in our corpus, shown in (3.124), indicating that its semantic argument can adverb t Û be implicit. A number of other instantiations of “as as” can, at least in spoken language, leave their argument implicit, including as often, as recently. (3.125) also seems felicitous. Moreover, this author uses the phrase no matter as an S-modifier without the accompanying WH-phrase and depends on context for interpretation. (3.124) Frank Gilmartin, a trader who follows insurance stocks for Fox-Pitt Kelton, said his strategy was to sell early. Then, if the stocks fell sharply, he planned to begin buying them aggressively, on the theory that the companies that insure against property damage and accidents will have to raise rates eventually to compensate for the claims they will pay to earthquake victims and victims of last month’s Hurricane Hugo. As well, reinsurers and insurance brokerage companies will have improved profits. (WSJ) (3.125) Jane smiled widely and burst into laughter. As cheerfully, Michael began to sing. 3.6.4 Sets and Worlds Approximate corpus counts of ADVP adverbials discussed in this section are shown in Table 3.64. 149
    • Table 3.64: Approximate Counts of Tokens and Types of some ADVP Adverbials # of Tokens 137 184 279 56 151 263 # of Types 16 29 19 31 49 96 Category Ordinal Frequency Epistemic Domain Non-Specific Evaluative/Agent-Oriented There are a number of ordinal ADVP in our data. Although only first co-occurs with an overt PP (first of all occurred 6 times)30 , the second column of Table 3.65 shows that others can as well. Table 3.66 shows synonymous -ly counterparts of such ordinals. Table 3.65: Ordinal ADVP Adverbials # 34 12 7 3 1 6 ADVP Adverbial first second third fourth last next Explicit Set first of all second of all third of all fourth of all last of all next to that Table 3.66: Ordinal -ly ADVP Adverbials # 5 1 ADVP Adverbial secondly thirdly # 1 47 ADVP Adverbial lastly finally Ordinals indicate the order of the modified element in a larger set. Semantically this set can be represented as a hidden argument, which may or may not be made explicit with a PP and may or may not be resolved to a set of abstract object interpretations of VP or S in the prior discourse. For 30 next to NP also occurred twice, e.g. “next to the ocean”; this interpretation yields a spatial ordering, however. 150
    • example, in (3.126 a), the relevant set is interpreted as the reasons why Mary is a space cadet, and in (3.126 b), the relevant set is interpreted as the NP many things.... Also required in the semantics of ordinals is representation of placement in the set. For example, on second thought requires a first thought, third of all requires two prior elements, and next requires at least one prior element. (3.126 a) Mary is a space cadet. First (of all the reasons why), she always forgets to buy milk for the morning. (3.126 b) I want to do many things today. First (of all those things), I want to buy milk. Other ADVP adverbials in our corpus that invoke sets are “frequency” ADVP adverbials. Some examples occurring alone and other with adverb modifiers are shown in Table 3.67. At times, as usual, from our PP corpus, are also frequency adverbials. Unlike ordinals, however, frequency adverbials invoke a particular type of set: a set of times. But they evoke this set the way quantifiers such as few, all evoke sets; although the number of elements contained in these sets may be specified (e.g. always versus occasionally), the set and its elements may but need not resolve in the prior discourse. Frequency adverbials have received a variety of semantic treatments; in [Roo95b], for example, sets of times are equivalent to events. Table 3.67: Frequency ADVP Adverbials # 4 2 8 19 ADVP Adverbial frequently always occasionally often # 19 2 8 49 ADVP Adverbial once twice occasionally sometimes # 14 1 37 2 ADVP Adverbial usually almost daily again all too often “Epistemic” ADVP adverbials have also been analyzed as invoking sets. Some examples found in our corpus are shown in Table 3.68. In truth, in fact, from our PP corpus, are also epistemic adverbials. Epistemic adverbials also invoke a particular type of set: a set of possible worlds, which are the foundation of a variety of intensional semantics (c.f. [HK98]). Sets of possible worlds, and the worlds contained therein may but need not resolve in the prior discourse, and epistemic adverbials may specify the number of worlds contained in the set (e.g. probably versus possibly). Other ADVP adverbials in our corpus that can be analyzed as evoking sets are “domain” ADVP 151
    • Table 3.68: Epistemic ADVP Adverbials # 83 48 31 11 ADVP Adverbial perhaps maybe actually surely # 2 13 22 44 ADVP Adverbial possibly probably certainly indeed # 3 5 5 1 ADVP Adverbial unquestionably really undoubtedly truly adverbials. Some examples are shown in Table 3.69. In theory, in psychology, from our PP corpus, are also domain PP adverbials. The domains invoked by these adverbials are again not anaphoric, but they can be very idiosyncratic; for example, the logical domain (e.g. where logic holds), the psychological domain, etc. Because domain adverbs specify a particular domain; alternative domains (where, for example, the assertion does not hold) may be evoked, making them feel more like epistemics; we will explore in Chapter 5 how such alternatives can be implicated. Table 3.69: Domain ADVP Adverbials # 1 6 1 1 ADVP Adverbial logically microscopically theoretically psychologically # 1 1 1 1 ADVP Adverbial literally physiologically visually publicly # 1 1 2 1 ADVP Adverbial mathematically technically statistically functionally On the other hand, there are a wide variety of ADVP and PP adverbials, exemplified in Table 3.70, which say next to nothing about the properties of the set they invoke or its elements. Rather, they specify the comparative position of the element they modify to that set. Some are analyzed as focus particles, particularly those in the last column; we will discuss them in Chapter 5. As discussed in Section 3.3, however, many adverbs are not so easily classified into a single modification type. Rather, they seem to have properties of numerous modification types. For example, the ADVP in the first column of Table 3.71 are at once domain and frequency adverbs; in our terms, the sets they describe have both temporal and idiosyncratic features. The ADVP in the second column supply less specific temporal information, they feel more like epistemics, as does the 152
    • Table 3.70: Non-Specific Set-Evoking ADVP Adverbials # 2 3 1 3 3 1 2 2 2 ADVP Adverbial basically essentially fundamentally specifically significantly primarily mainly mostly partly # 2 3 17 41 1 3 1 13 3 PP Adverbial on the whole in essence in particular at least at worst at best in the main in short in part # 48 2 6 6 11 ADVP Adverbial also even just only too PP adverbial counterpart in general. The ADVP in the third column of Table 3.71 are both ordinals and temporal; they describe ordered sets of times or events. Table 3.71: Multiply-Featured ADVP Adverbials # 2 1 ADVP Adverbial historically traditionally # 2 2 11 ADVP Adverbial ordinarily typically generally # 5 3 4 ADVP Adverbial initially originally ultimately Many ADVP adverbials in our corpus which have been classified as “evaluative” also convey “domain”, and/or “epistemic”, and/or “spatial” properties. Some examples are shown in Table 3.72. Again, we can use sets with multiply featured idiosyncratic properties to represent this. Table 3.72: More Multiply-Featured ADVP Adverbials # 10 13 2 19 ADVP Adverbial clearly naturally evidently apparently # 4 1 1 5 ADVP Adverbial ideally reputedly conceivably presumably # 23 2 1 4 ADVP Adverbial obviously hopefully seemingly inevitably “Evaluative” adverbials are also easily confused with “Agent-Oriented” adverbials. Some ex153
    • amples are shown in Table 3.73 along with their counts. Table 3.73: Evaluative or Agent-Oriented ADVP Adverbials # 2 1 1 1 ADVP Adverbial carefully deliberately enthusiastically relentlessly # 2 1 1 1 ADVP Adverbial cautiously desperately stealthily gently # 1 2 1 1 ADVP Adverbial convulsively emotionally tardily bluntly While, as discussed in Section 3.4, [Ern84] clarifies the distinction between event and situation modification (e.g. Carefully John approached the Duchess carelessly), there is another distinction which he overlooks that can help clarify the distinction between evaluate and agent-oriented adverbs. This distinction can be viewed as an additional hidden argument, paraphrased as to X or, in X’s opinion. For example, Carefully (in John’s opinion), John approached the Duchess carelessly describes John’s opinion about his action, and thus carefully is agent-oriented. But Carefully (in the opinion of all outside observers), John approached the Duchess carelessly describes an omniscient opinion about John’s action (which may or may not include John’s opinion), and thus carefully is evaluative. How the reader resolves X determines the function of these adverbs. [Swa88]’s view of not only evaluative and agent-oriented, but also epistemic adverbs as Speaker says ADJ(S) is similar; only the hidden argument approach, however, acknowledges cases where the implicit X is resolved to someone other than the speaker. Such an analysis applies to other evaluatives, including (not) surprisingly, luckily, (un)fortunately, miraculously, all of which are found in our corpus, and some of which, along with some of those in Table 3.72, are intuitively treated by [Kno96] as discourse connectives. In other words, their treatment appears to be due to the effect of this hidden opinion, namely, that opinions are asserted for a reason. If something is asserted to be obvious, or (not) surprising, then there exists some reason that assertion is made, be it visual or otherwise apparent, as obvious implies, or a logical cause, as inevitably implies. More generally, such an analysis should apply to all adverb uses. For example, if something is asserted to be probable, deliberate, obvious, who the assertion is attributed to is also relevant; it may attributed to a particular someone, or to everyone, as is likely the case 154
    • when something is asserted to be first. Clearly, additional semantic (and pragmatic) complexity is required to represent this effect, such as a semantics which incorporates mutual knowledge and speaker’s beliefs (c.f. [LA99, HK98, Hir91]). 3.6.5 Summary In this section, we’ve presented a wide variety of ADVP from our corpus of ADVP adverbials. Using the same semantic mechanisms as in Section 3.5, we’ve argued that clausal and discourse ADVP adverbials can be distinguished by the interpretation of the adverb and its hidden semantic arguments. We’ve seen that although such cases are not the focus of the clause-level ADVP literature discussed in Sections 3.3-3.4, many ADVP adverbials in our corpus can take an optional PP that instantiates this hidden argument. A summary of some of the mechanisms that can cause an ADVP adverbial to require an entity from the discourse or spatio-temporal context for its interpretation are shown in Table 3.74. Table 3.74: ADVP Adverbial Summary Type potential DD hidden NP argument hidden AO argument hidden AO argument † … 3.7 ADVP Adverbial here/now unfortunately (for me) consequently (on that) first (about that) Conclusion In this chapter we have shown that in many cases what have been called “cue phrases” or “discourse connectives” are not an accidental grouping of ADVP and PP adverbials; rather, their discourse properties arise naturally from their semantics. We have shown that whether or not ADVP and PP adverbials found in a corpus function as discourse connectives and are classified as discourse adverbials depends on the interpretation of their semantic arguments. We have shown that discourse adverbials are very similar to discourse deixis, in that both require for their interpretation an AO 155
    • made available from a VP or S in the prior context. In Chapter 4 we will discuss ways in which the semantic framework outlined here can be formalized and incorporated into the DLTAG model. In Chapter 5 we will discuss other ways apart for the interpretation of their semantic arguments that adverbials contribute to discourse coherence. 156
    • Chapter 4 Incorporating Adverbial Semantics into DLTAG 4.1 Introduction In Chapter 2, we described similarities and differences between discourse theories in terms of necessary modules in a complete discourse model. We introduced DLTAG as a theory of intermediatelevel discourse structure that bridges the gap between clause and discourse theories by treating discourse connectives as predicates and using the same syntactic and semantic mechanisms that build the clause to build discourse. In Chapter 3, we distinguished discourse and clausal adverbials and showed how the predicate argument structure and interpretation of discourse adverbials causes them to function as discourse connectives. In this chapter, we investigate how the predicate argument structure and interpretation of adverbials can be incorporated into a syntax-semantic interface for the DLTAG model. This chapter will be exploratory rather than conclusive, as a complete syntax-semantic interface incorporating all aspects of DLTAG and all discourse connectives requires a thesis in its own right. In Section 4.2, we discuss the role of the syntax-semantic interface, review LTAG, the clause-level model upon which DLTAG is built, and compare syntax-semantic interfaces that have been proposed for LTAG. In Section 4.3, we review DLTAG, discuss a syntax-semantic interface that has been proposed for 157
    • a similar tree-based discourse model, and explore DLTAG extensions of LTAG syntax-semantic interfaces. In Section 4.4, we discuss how the DLTAG annotation project can be used to develop anaphora resolution algorithms for the anaphoric arguments of discourse adverbials and to train a statistical version of the DLTAG parser to resolve ambiguous structural connections. 4.2 4.2.1 Syntax-Semantic Interfaces at the Sentence Level The Role of the Syntax-Semantic Interface Natural language can be defined as a set of objects1 . At the sentence level, for example, these objects correspond to grammatical and interpretable sentences; at the discourse level, they correspond to grammatical and interpretable discourse units. The goal of a language grammar (syntax) is to reproduce the structures of all and only these objects. The grammar must therefore characterize the properties of this set (e.g. if it is finite or infinite), define the minimal units that compose its members, and define rules for combining these minimal units that produce all and only these members. In the same way, the goal of a formal meaning representation (semantics) is to reproduce the interpretation of all and only these objects. Again the properties of this set must be characterized, and the minimal units that compose its elements defined, along with rules for combining them that produce all and only these interpretations. In linguistics, the principle of compositionality asserts that the meaning of a whole is a function of the meaning of its parts. At the sentence level, for example, meaningful sentence components correspond to syntactic sentence constituents. Thus, if in a grammar for English we have the rule u that a sentence is minimally composed of a noun phrase followed by a verb phrase (e.g. S NP VP), then we might suppose a one-to-one correspondence between syntactic rules and semantic rules, such that the interpretation of a sentence is minimally composed of the interpretation of a noun phrase combined with the interpretation of a verb phrase (e.g. S’ = f (NP’,VP’)). The goal of the syntax-semantic interface is then to define, with respect to the syntactic structure, the extraction from and assembly of the components involved in its interpretation. 1 This discussion is derived from [Gaz99]. 158
    • 4.2.2 LTAG: Lexicalized Tree Adjoining Grammar An LTAG (see [JVS99]) is a lexicalized tree-adjoining grammar which itself is an extension of a tree adjoining grammar (TAG) (see [Jos87]). The object language of an LTAG (or TAG) is a set of trees, rather than strings. Trees allow the underlying structure of a surface string to be represented, as well as the string itself. The languages generated by (L)TAGs are well known to be “mildly-context sensitive”, properly containing the context-free languages and properly contained by the indexed languages. An LTAG consists of a finite set of elementary trees and operations for combining them. Elementary trees are associated with at least one lexical item, called the anchor. Elementary trees represent extended projections of the anchor and encode the syntactic/semantic arguments of the anchor. An anchor may be associated with more than one tree, called a tree family, each tree reflects the different syntactic constructions in which that anchor can appear. For example, the verb eat may be either transitive or intransitive; each of these forms is given a corresponding tree. There are two types of elementary trees in an LTAG: initial trees, which encode basic predicateargument relations, and auxiliary trees, which encode optional modification and must contain a non-terminal node (called the foot node) whose label matches the label of the root. Examples of elementary LTAG trees are shown in Figure 4.1. The final tree in this figure is an auxiliary tree, all the others are initial trees. S N NP D John VP D N ADV the dog VP Ì Ë Ú ÙÙ Ú NP VP Ë% Ï Ï Ð Ð Ë3 Í Í Î Í Î Î V × Ö Ö × NP NP often walks Figure 4.1: Elementary LTAG Trees There are two structure-building operations in an LTAG for creating complex trees, called derived trees: substitution (indicated by ) and adjunction (indicated by ). Ì Ë Ë The substitution operation is restricted to non-terminal nodes marked by on the tree frontier. Substitution consists of replacing this node with the tree being substituted. Only initial trees or 159
    • trees derived from initial trees can be substituted, and the root node of the tree being substituted must match the label of the node being replaced. For example, the tree anchored by the can be Ë âá substituted for the node labeled in the tree anchored by dog. This tree can then be substituted for the internal argument (NP ) in the tree anchored by walks, and the tree anchored by John can be % substituted for the external argument (NP ) in the tree anchored by walks. The result of these two 3 substition operations is the derived tree in Figure 4.2. S Í Í Î Í Î Î VP •• • – –– NP V John Ú ÙÙ Ú N walks NP D N the dog Figure 4.2: LTAG Derived Tree after Substitions The adjunction operation is restricted to non-terminal nodes not already marked for substitution. (initial, must match ) to which it is to be adjoined. If this is the case, the root node of is attached to the foot node of , and the ) ‡ ‡ ‡ ‡ rest of the tree that dominated is )  identified with ; the subtree that was dominated by )  in by adjunction, the root node of ) the label of the node and ) auxiliary, or derived). In order to combine and any other tree  Adjunction consists of building a new tree from an auxiliary tree now dominates the root node of . For example, the tree anchored by often can adjoin to the VP node of the derived tree in Figure 4.2, producing the derived tree corresponding to the sentence John often walks the dog, as shown in Figure 4.3. The trees in Figures 4.2 and 4.3 do not record the information about which elementary trees were combined and which operations were used to combine them. Thus, in addition to the derived ‡ „C˜ j ãC oF 9–qRU"qE tree that represents the results of combining trees to form complex trees, a tree in LTAG specifies uniquely how a derived tree was constructed. Nodes are labeled according to the elementary tree involved at that point in the derivation; the root of the derivation tree corresponds to the ¦ f¡ 160 labels initial trees whose anchor is (by defi- A label of the initial tree whose root is S. In general,
    • labels auxiliary trees whose anchor is (by definition an A ) ¦ nition an initial tree is substituted), and auxiliary tree is adjoined). A tree address is also associated with each node in the derivation tree ˜ ‡F o j ™"pGe€ except the root. This address is the address of the node in the tree at which the substitution or adjunction operation has occurred. The address of the root node in the parent tree is 0, the address  42  child of that root is (k), and (p.q) is the address of the child of a node . €  42 ä of the S ÍÍ Í ÎÎ Í Î Î VP Í Í Î Í Î Î ADV John often VP •• • – –– N Ú ÙÙ Ú NP V walks NP D N the dog Figure 4.3: LTAG Derived Tree After Adjunction The derivation tree corresponding to the derived tree in Figure 4.3 is shown in Figure 4.4. As  ¦¤ ¢ "©¨§¥£¡ p& ($¡ 32 0 541 ) 3 3 2 4£¡ % substitutes into at at address (1) (D). p (&$¡ 32 0 94"!  "! ¡ ) ““ “ “ “ ” ” ””  ¦ ¤ ¢” ©¨§“ ¥£¡ ” (1) p& (f¡ at address (2) (VP), and at address  ¦¤ ¢ "8¨76$¡ substitutes into  ¦¤ ¢ ©¨§¥£¡ address (2.2) (NP ). adjoins into substitutes into  "! ¡ (1) (NP ), while .  ¦¤ ¢ "8¨76$¡ shown, the initial tree rooting the derivation is (2) (2.2) (1) 3 2 4$¡ Figure 4.4: LTAG Derivation Tree 4.2.3 A Syntax-Semantic Interface for LTAG Derivation Trees [JVS99] argue that if the LTAG operations of substitution and adjunction are viewed as attachments of one tree to another tree, then a syntactic sentence derivation consists of an unordered set of attachments, and the corresponding semantics can built monotonically as a semantics of attachments 161
    • and described by a flat semantic representation. For example, the semantic representation of the sentence corresponding to the structure in Figure 4.3 might be as in (4.1), where i1, i2 denote individuals and e1 denotes an event. t walks(e1, i1, i2) dog(i1) t t (4.1) john(i1) often(e1) [JVS99] note however, that the adjunction operation does not preserve monotonicity with respect to the immediate domination relation; the VP that is immediately dominated by S immediately , but no longer immediately dominates V after adjunction of 32 0  941f¡  ¦¤ ¢ ©¨§¥£¡ dominates V in . Although under-specification of the immediate dominance relation would remove this nonmonotonicity, the tacit assumption in many LTAG formalisms is rather that the nodes on the trunk of a tree (the path from the root to the anchor) are not distinguished semantically; ([KJ99, Kal02, JVS99] and references therein) argue that using a semantics defined in terms of syntactic attachments removes the need to make use of syntactic under-specification if compositional semantics is defined with respect to the derivation tree (Figure 4.4), rather than the derived tree (Figure 4.3). These authors argue that this is the natural tree upon which compositional semantics of sentences should be built, because the predicate argument structure of a lexical item is represented only in the derivation tree (not in the derived tree), and only the derivation tree records the different elementary trees involved in the derivation and distinguishes the substitution of arguments into elementary trees from the optional modification of (adjunction to) lexical items by other lexical items2 . To build the compositional semantics from the derivation tree, [JVS99] associate a tripartite semantic representation with each elementary tree. The first part of the representation specifies the main variable of the predication. The second part (inner box) states the predication. The third part (lower boxes) associates variables with argument nodes in the elementary trees. As an example, the are shown in Figure 4.5. in the subject position of is obtained by unifying the variable (x ) corresponding to the subject node3 in w is € ˜ k Y„ j  "!¡  ¦¤ ¢ "8¨76$¡ the variable (x ) which the representation corresponding to  ¦¤ ¢ ©¨§¥£¡  ¦¤ ¢ "8¨76$¡ The composition of these representations after substitution of  ¡ and  "!¡ semantic representations of (transitive) with . After unification, the ™ 2 See [JVS99] for details concerning differences between the domination relations in derivation and dependency trees, and constraints on how the derivation tree must be traversed when building the compositional semantics. 3 This will be formalized below by [Kal02] by stating pairs of variables and corresponding node addresses. 162
    • second parts (inner boxes) are merged. The resulting composition is still . The semantic representation after composing "! ¡   ¦¤ ¢ "8¨76$¡ € ˜ k e„ j wF represented as the event of walking and is shown in Figure 4.6.  ¡ named: x about: x w ™ walk(e , x , x ) g John(x ) ™ w w x2  ¦¤ ¢ "8¨§¥$¡ Figure 4.5: Semantic Representations of and  "! ¡ x1 ™  ¦¤ ¢ "8¨76$¡ about: e John walks w about: e named: x w walk(e , x , x ) John(x ) g w w w x2 Figure 4.6: Semantic Representations of John walks [JKR03, Kal02, KJ99] further formalize a syntax-semantic interface in which compositional semantics depends on the LTAG derivation tree. In order to represent scope ambiguities in quantifier adjuncts, they employ a restricted use of multi-component TAGs, along with more complex flat semantic representation using ideas from Minimal Recursion Semantics ([CFS97]) and Hole Semantics ([CFS97, Bos95]), which consists of three parts: typed lambda expressions, scope constraints, and argument variables. [Kal02] further argues that the derivation tree can be enriched with additional links to support a compositional semantics that can represent all the differentiated scope orderings of quantifier adjuncts, such as those produced by quantifier and PP adjuncts of NPs4 . We will discuss details of this approach below in reference to DLTAG; in the remainder of this section we illustrate how the compositional semantics of a simple example can be built from the , ) , 32 0 941  "! ¡ "8¨76$¡  ¦¤ ¢ LTAG derivation tree. [JKR03, Kal02, KJ99]’s semantic representations for 4 [FvG01] propose an alternative solution to some of these problems that uses information from both the derivation and the derived LTAG tree. 163
    • are shown in Figure 4.75 . As shown, lambda expressions (formulas) may contain propo- %A (whose values are propositional labels), and hole variables %@ variables (meta-variables for proposition labels), as well as propositional argument %”  &% # ('¥$¡ sitional labels , holes % † and (whose values are holes). Holes and labels are used to generate under-specified semantic representations and allow for scope å Æw which may produce scope ambiguity constrained as h l . w wA a hole above the proposition label , for example, there is 8¨§¥$¡ ¦¤ ¢ ambiguities, whose ordering is constrained by the scope constraints. In Quantifiers and adverbs can introduce additional holes and labels, as shown; scope disambiguation occurs when these semantic representations are combined, as discussed below. Argument variables may be of any variable type, and may be linked to addresses in the syntactic tree. This linking is x ,(1) , which indicates that x is linked to address (1)). t w 32 0 541 ) g , and w å æw ç , g g ™ , å g g  ¦¤ ¢ "8¨76$¡  &% #    ('¥$¡ "! ¡ ©¨§¥£¡  ¦¤ ¢ t g Û t w w w å æw w Û Figure 4.7: Semantic Representations of l : often(h ) l ,h s g —————— arg: g , s w  &% # ('¥f¡ Fido(x ) ———– arg: – John(x ) ———– arg: – 32 0  541$¡ "!¡ w Û l : walk(x , x ) l h ——————————— arg: x ,(1) , x ,(2.2) w explicitly represented as a pair (e.g. When the derivation tree for the sentence John often walks Fido is built from its constituent trees, the semantic representation of this sentence can also be built. This representation is shown in Figure 4.8. John often walks Fido l : walk(x , x ), John(x ), Fido(x ), l : often(h ) h l ,h l ,h l —————————————————————— arg: – g g ç ™ w å g g ç å æw ™ w å æw w Figure 4.8: Semantic Representations of John often walks Fido Combining semantic representations consists of building the union of the semantic representations of the elementary trees involved in the derivation and assigning values to argument variables. The derivation tree indicates how the semantic representations are to be combined such that their ar5 As [JKR03, Kal02, KJ99] do not discuss their semantic representation of definite determiners, we illustrate their basic approach by replacing the dog from our earlier example with . ëê "4é è 164
    • gument variables get values. When a tree is substituted, its value is applied to the argument variable paired with the position at which it attaches. For example, the derivation tree in Figure 4.3 indicates  ¡ w  &% # §T¥f¡ from the semantic representation of , and the value of x comes g that the value of x comes from the semantic representation of . When a tree is adjoined, however, its argument variables are added to the representation and are assigned the values at the position at which it attaches. For example, the derivation tree in Figure . Because % w is a propositional argument variable whose values are propositional labels, it is assigned to is a hole variable whose values are holes, it is assigned to w † w ” w @ and because wA 8¨76$¡ ¦¤ ¢ 4.3 indicates that the values of s and g come from the semantic representation of , . These assignments are reflected in the scope constraints in the combined semantic representation. Scope disambiguation consists of finding bijections from holes to labels that obey the scope l . We also know å g g l and we already know that l t g g å w g w ìw uv l , because if h = l then l l , and l . w g w w t xg we know that h t Æg å w h , because h appears inside the formula labeled l . We therefore know that l w g that l l and h w constraints. According to the scope constraints in Figure 4.8, h Therefore, the only possible disambiguation of our holes is: h =l and h = l . This yields the g w g w embedded semantic representation in (4.2). walk(x , x ) often(walk(x , x )) ™ t ç Fido(x ) ç t ç ™ t (4.2) John(x3) Note that we have presented in this example the more complex semantics involved in the treatment of scope ambiguities for the purpose of showing how scope ambiguities are handled semantically in this approach; below when we consider a DLTAG version of this approach we will not consider scope ambiguities, and we will thus follow [KJ99] in using simplified semantic representations such as shown in Figure 4.9. 32 0  941f¡  ¦¤ ¢ "8¨76$¡ l : often(s ) ————— arg: s g ) and 32 0 941  ¦¤ ¢ "8¨76$¡ t 165 w g g Û t w w w Û Figure 4.9: Simplified Semantic Representation of w l : walk(x , x ) ——————————— arg: x ,(1) , x ,(2.2)
    • 4.2.4 A Syntax-Semantic Interface for LTAG Elementary Trees [SBDP00] present an extension of the LTAG grammar in which the lexical entries of motion verbs are crafted to make it as easy as possible for a natural language generation system (SPUD, [SD97]) to select the verb that best helps its communicative goals be achieved. Briefly, SPUD generates instructions for action in a concrete domain. SPUD’s desired output is to mirror the naturallyoccurring action instructions in a selected corpus. SPUD’s input consists of (1) a representation of the context in which instruction is to be issued, (2) a set of communicative goals describing content that the instruction should contain, (3) a database of facts describing generalized individuals involved in the action (e.g. paths, places and eventualities). When planning a sentence, SPUD searches the derivations of a true sentence that are admitted by the grammar for one which best achieves its communicative goals in the current context. [SBDP00] construct a lexical entry, consisting of an elementary tree, a syntax-semantic interface, and a semantics, for five motion verbs, based on an analysis of their use in an instructional corpus. The trees associate each anchor (verb) with its observed range and order of complements and modifiers according to their modification type6 , by using exhaustive node hierarchies in the elementary tree, thereby allowing SPUD to generate these elements in any order while still producing the correct surface order. As an example, the elementary tree for slide is shown in Figure 4.10. S VP-DUR Ì VP-PRP Ì Ì •• – • – – NP × Ì Ö Ö × VP-PTH NP Ë V slide Figure 4.10: The Elementary Tree for slide This tree structure represents the observed type, order, and optionality in the investigated corpus 6 see [KP02]) and Chapter 3 for discussion of these modification types. 166
    • of the arguments and modifiers of slide. As shown, all optional elements, whether they are determined to be optional arguments or adjuncts, are represented using adjunction so that SPUD is not forced to anticipate how the sentence is to be completed before selecting among alternative trees. An example demonstrating the use of slide and these elements is shown in (4.3). quickly to achieve a tight seal. †˜ j !Ge€ ‡ „C˜ j o k 9GUepE Slide F@„ €o k beVYY€ (4.3a) the cover across the pole (PTH) (DUR) (PRP) € ˜iF Vg–{ „ SPUD uses an ontologically promiscuous semantics [Hob85] such that each lexical entry used in the derivation of an utterance contributes a constraint to its overall semantics; the syntax-semantic interface determines which of the constraints contributed by an entry describe the same generalized individuals. For example, given the phrase slide the sleeve quickly, the syntax-semantic interface í BF guarantees that the event described by slide is identified with an event that is quick. To express F the semantic relationships between multiple entries in a derivation, [SBDP00] associate each node in the elementary tree with the individuals that the node describes. When one tree combines with another by substitution or adjunction, a node in one tree is identified with a node in another tree and the corresponding entities are unified. The individuals associated with each node in the elementary tree in Figure 4.10 are shown in Figure 4.11. The collection of individuals associated with the nodes of a verb tree are called its semantic arguments. [SBDP00]’s notion of a semantic argument is clearly distinguished from the LTAG notion of an syntactic argument. Each syntactic argument corresponds to one semantic argument (or more), since the syntactic argument position is a node in the tree and is associated with some semantic argument(s). However, semantic arguments need not be associated with syntactic argu- F ment positions. For example, there is no node in the tree into which the eventuality described by slide substitutes, but eventualities are nevertheless treated as a semantic argument of both the verb and of event modifiers such as quickly, as in [Dav67]. Moreover, optional constituents specifying paths, durations, or purposes are usually treated syntactically as modifiers, using adjunction. Here, although all optionality is treated via syntactic adjunction, these adjunction sites may either be associated with references to only the overall eventuality argument of the verb, making these adjuncts 167
    • semantic adjuncts as well, or with references to additional semantic arguments (e.g. paths), making them semantic arguments. [SBDP00, 4] note however that it is a substantive question for grammar design which optional constituents should be treated as specifying additional semantic arguments for a given verb entry; they make use of the tests described in Chapter 3 to distinguish semantic arguments from semantic adjuncts (e.g. the do so test, the extraction test, the presupposition test). event: e path: p changed: obj u Ì ~~ „ ~~ î ôË u u ƒ      ‚ƒ         } NP event: e path: p changed: obj ind: obj ï event: e „ u óÌ } V î u ÈÌ VP-PTH î VP-DUR event: e 8ƒƒ‚ u òÌ VP-PRP ind: agent ï ï î ð ð ð ð ñ ñ ñ ñ ð ð ð ð ð ñ ñ ñ ñ ñ ï î ðuñ NP event: e ï S slide F ECA G9DB@ Figure 4.11: The Syntax-Semantic Interface for [SBDP00] further specify the semantic representation associated with each verb entry, in terms of an assertion and a presupposition about the individuals referenced in the tree. The semantic representation associated with the verb entry for slide is shown in (4.4). (4.4 a) Presupposition: located-at-start(obj, ), along-surface( ) € € (4.4 b) Assertion: caused-motion( , agent, obj, ) € F In SPUD, the assertion contributes new relationships among generalized individuals. For example, slide asserts that an agent causes an object to move along a path. The presupposition indicates background knowledge about additional relationships between generalized individuals involved in the assertion7 . For example, a sliding event presupposes that the path along which the object travels 7 We discuss presupposition in detail in Chapter 5. 168
    • has an origin, and presupposes that this path involves a surface that remains in contact with the object during the sliding. 4.2.5 Comparison of Approaches Although both [JKR03, Kal02, KJ99, JVS99] and [SBDP00] propose a semantics and and a syntaxsemantic interface linking it to the LTAG grammar, the two approaches differ in a number of superficial ways. First, [SBDP00] postulates a different set of elementary trees. Each of their trees fully specifies, as hierchical nodes, all observed syntactic arguments and modifiers that appear with the corresponding verb in a corpus, regardless of whether they are determined to function semantic arguments or adjuncts. In [JKR03, Kal02, KJ99, JVS99]’s approach, only syntactic arguments are represented in their elementary trees. Second, [SBDP00] specifies in the elementary tree the semantic arguments contributed by both argument and modifier nodes to the semantic representation, while [JKR03, Kal02, KJ99, JVS99] specify only in the semantic representation associated with the entire elementary tree the semantic contribution of argument nodes. Third, [JKR03, Kal02, KJ99, JVS99] argue that there is no semantic distinction between the nodes on the trunk of the syntactic tree (e.g. S-VP-VP-V), while [SBDP00] explicitly distinguishes the semantic contribution of each node. Fourth, [SBDP00] don’t specify whether, during substitution or adjunction, the identification of the entities supplied by a complement or modifier with the entities associated with the corresponding node in the verb elementary tree is made based on the derived or derivation tree. [JKR03, Kal02, KJ99, JVS99] argue that the derivation tree is the appropriate place to specify the syntax-semantic interface because the derived tree is syntactically non-monotonic and only the derivation tree records the predicate argument structure and operation involved in constructing the syntactic representation. Although [SBDP00] do not provide details on how the interpretation of a sentence is constructed, [SBDP00] and [JKR03, Kal02, KJ99, JVS99]’s approaches are potentially complementary. In particular, in [SBDP00]’s approach, each node has a unique address. Therefore the identification of the specified semantic arguments at each node with the semantic arguments of trees adjoined or 169
    • substituted at those nodes can be based on the derivation tree. The most significant difference between the two approaches is [SBDP00]’s pre-specification of all possible arguments and modifiers in the elementary tree. The purpose of this is to provide SPUD with all the information it may need to exploit in a computationally efficient way, when generating an utterance according to its conversational goals. Given that SPUD applies in a limited domain and its output is meant to mirror a finite corpus, this approach may be most computationally efficient in this domain. It is not clear however, that this approach is best when considering generation or interpretation in unlimited domains. For example, in Chapter 3, a wide range of modifiers drawn from the WSJ and Brown were discussed, some of which could be categorized into multiple modification types. It remains an open question of whether it is possible to create a modification classification which is at once compact enough to be incorporated into the elementary trees of all verbs, and at the same time accurately and comprehensively covers the interpretations of (and any idiosyncratic semantic arguments specified by) all modifiers. We end this section with the comment that LTAG is not the only syntactic grammar for natural language; Combinatory Categorial Grammar (CCG) [Ste96] is one alternative which encodes buth the syntactic and semantic properties of words in the lexicon. For example, the transitive verb like might have the lexical entry in (4.5). x y.like(y,x) Briefly, in CCG, / refers to a rightward-looking category and to a leftward-looking category. õ sem: S NP/NP ß ß syn: õ (4.5) like = Categories combine using rules of function application. (4.5) states that the syntactic category of like is a function that requires its syntactic argument, an NP, on its right. A function application rule is applied, such that the NP on the right is identified and the corresponding semantic argument is simultaneously bound to the outer variable . A new function is produced that requires its syntactic š argument, again an NP, on its left. Another function application rule is applied, such that the NP on the left is identified and the corresponding semantic argument is simultaneously bound to the variable . The result is an S with semantics like( , ), where and have been bound as described. š › š › › One reason for using CCG is that it removes the need for a syntax-semantic interface specifying 170
    • how a semantics and a syntax are related. See [Ste96, JVS99] for a discussion of similarities and differences between the two grammars. 4.2.6 Summary In this section we have overviewed the LTAG grammar, and have presented two syntax-semantic interfaces that have been proposed for LTAG. In the next section we will focus on the details of DLTAG, which builds discourse grammar directly on top of the LTAG clause grammar, and investigate how syntax-semantic interfaces for such LTAG-based discourse grammers can be defined. 4.3.1 Syntax-Semantic Interfaces at the Discourse Level DLTAG: Lexicalized Tree Adjoining Grammar for Discourse   4.3   Like many of the other discourse models discussed in Chapter 2, DLTAG [FMP 01, CFM 02, WJSK03, WKJ99, WJSK99, WJ98] argues that discourse can be modeled in terms of syntax and semantics, but unlike other models, DLTAG defines discourse structure and meaning in terms of the same mechanisms that are already used at the sentence level, and builds them, furthermore, directly on top of the clause. As [WJSK03, 23] note, the notion that discourse relates to syntax and semantics in a completely different way than the sentence seems strange when we consider examples such as (4.6), where an entire discourse is contained within a relative clause, exhibiting the same cohesive and argumentative connections which are characteristic of other discourses. (4.6) Any farmer who has beaten a donkey and gone home regretting it and has then returned and apologised to the beast, deserves forgiveness. In DLTAG, discourse connectives are treated akin to verbs at the clause level. As discussed in Section 4.2, verbs at the clause level are generally viewed as predicates that take entity interpretations and supply relations between them to form a sentence interpretation. In DLTAG, discourse connectives are also predicates that take entity interpretations and supply a relation between them to form interpretations of larger discourse units. The set of discourse connectives in DLTAG currently includes the subordinating and coordinating conjunctions, punctuation at clause boundaries, 171
    • and discourse adverbials, As we saw in Chapter 3, the entity interpretations they relate are abstract objects (AOs), which may come from non-NP constituents within clauses, clauses themselves, or discourse units that are composed from both clauses and discourse connectives, within or across sentence boundaries. We will thus use D to represent the syntactic arguments of all discourse con- S nectives and the unit that results (rather than S), both to indicate that these units are being analysed from the point of view of discourse, and because the clause-level syntactic consituent corresponding to these units can vary. DLTAG currently builds discourse using the structures and structure-building operations of (LTAG)[JVS99], which itself is widely used to model the syntax of sentences. As in LTAG, in DLTAG each elementary tree is anchored by at least one lexical item or corresponding feature structure8 . Also as in LTAG, there are two kinds of elementary trees: initial trees that represent atomic units and localize predicate argument dependencies, and auxiliary trees that represent optional modification. In DLTAG, two initial trees are proposed in the tree family that represents subordinating conjunctions, exemplified in Figure 4.12. Two trees are proposed because of the syntactic alternation that subordinating conjunctions allow with repect to their position relative to their arguments [QGLS85], as shown in (4.7). (4.7 a) John is hard to find, although he is generous. (4.7 b) Although he is generous, John is hard to find. D D ËS ËS ‘‘ ‘ ‘ ’ ’’ S‘ ’ ’ although D ËS ËS ‘ ‘ ‘ ö ’ ’ S ‘ö ’ ’ D Although D D Figure 4.12: DLTAG Initial Trees for Subordinating Conjunctions In LTAG, subordinated clauses are treated as adjuncts because they are not part of the extended projections (e.g. argument structure) of the verb of the main clause. In DLTAG, however, it is the extended projections of the discourse connective, not the verb, that are being modeled; subordinat8 See [WJ98] for a discussion of reasons for treating lexical anchors as feature structures that may or may not be lexicalized. 172
    • ing and coordinating conjunctions relate two clausal interpretation to form a larger discourse unit and thus are represented as taking two substituted arguments. Subordinating conjuctions are thus modeled with initial trees; as shown in (4.8), the local dependencies between their arguments (4.8 a) can be stretched long-distance via adjunction of an additional clause (4.8 b), as is also true of local dependencies at the clause level (e.g. Apples, Bill says John may like.). (4.8 a) Although John is generous, he’s hard to find. (4.8 b) Although John is generous–for example, he gives money to anyone who asks him for it-he’s hard to find. In DLTAG, two different types of auxiliary trees are proposed. The first type is used to represent simple coordination and an empty connective , as exemplified in Figure 4.13. H D and D D D Figure 4.13: DLTAG Auxiliary Tree for and and H ËS Ì S Í Í Î Î SÍ Î D ËS H Ì S •• • – –– S D While both arguments in these trees come structurally, adjunction to a discourse unit in the H prior discourse (rather than substitution) represents the fact that and, or, convey a continuation (or optional modification) of something in the prior discourse. In other words, simple coordination provides further description of a situation or of one or more entities (objects, events, situations, states, etc.) within the situation [WJSK03]. This is akin to the notion of elaboration, as exemplified in (4.9 a); it also accounts for cases where a coordinating conjunction is used to connect two clauses that supply the same relation via a structural connective to the prior discourse, as exemplified in (4.9 b), where both disjuncts convey an alternative point at which John will quit his job. (4.9 a) John went to the zoo [and/.] H/he took his cell phone with him. (4.9 b) John will quit his job when he wins the lottery [and/or] he marries a rich woman. The coordinating conjunctions so, but convey more than simple continuation, however; so, for example, conveys a result relation in (4.10). These connectives are thus represented with initial 173
    • trees9 . (4.10) You didn’t eat your spinach so you won’t get dessert. The second type of auxiliary tree proposed in DLTAG represents discourse adverbials; as shown in Figure 4.14, there are two trees in the tree family representing discourse adverbials due to their ability to appear S-initially and S-finally. Note however that, as discussed in Chapter 3, discourse adverbials, like all adverbials, can appear in a variety of other positions. Currently in DLTAG, discourse adverbials are extracted from S-internal positions and the modeled using the S-initial tree   with a trace left in their original position (see [FMP 01] for details). D then ÷÷ øø S Ì S ÷÷ øø S D then D Ì S D Figure 4.14: DLTAG Auxiliary Trees for Discourse Adverbials These DLTAG trees are structurally identical to LTAG trees for S-modifying adverbials (except for the label of their root node); the difference lies in the fact that in the DLTAG (discourse) grammar all discourse units are structurally related to the preceding discourse. A discourse adverbial is viewed as an optional modification of the incoming unit, which supplies an additional semantic relation over and above the semantic relation supplied by the structural connection; thus the argument a discourse adverbial modifies is represented as adjunction. Only the modified argument comes structurally, however. The other argument involved in the semantic relation supplied by the adverbial must be resolved anaphorically. In order to connect a discourse unit modified by a discourse adverbial to the prior discourse, therefore, a structural connective must be employed (e.g. , and, H etc.), which supplies its own semantic relation. Moreover, there may be additional inferred relations between the incoming unit and the prior discourse over and above both the relation supplied by the structural connection and the relation supplied by any discourse adverbials. For example, in DLTAG’s view, and as a result in (4.11) each supply a semantic relation between the interpretations of H John came home late and Mary left him; in addition, a temporal relation between these two clauses 9 This is not explicitly stated for but; however, by definition it must be the case. The same argument may also hold for certain uses of and, or; see [WJSK03], footnote 17, for discussion. 174
    • is inferred10 . (4.11) John came home late. As a result, Mary left him. As discussed in Chapters 2 and 3, behavioral evidence is presented in [WJSK03] to support the theoretical view that discourse adverbials take their prior argument anaphorically. This evidence includes the range of ways this argument can come from the prior discourse. For example, although discourse connectives are generally taken as signalling discourse relations between adjacent discourse units, just as can NP anaphora, discourse adverbials can also take their prior argument from intra-sentential and implicit material. In (4.12) [WJSK03, 7], embedded nevertheless relates the interpretation of the matrix clause to the interpretation of the relative clause. This option is not available to subordinating and coordinating conjunctions because their arguments are constrained to be adjacent discourse units of like syntactic type. (4.12) Many people who have developed network software have nevertheless never gotten very rich. (i.e. despite having developed network software) In (4.13) [WJSK03, 7], otherwise can access the inferred condition of if the light is not red. This material is not available to structural connectives; or can only access the consequent clause (stop) or the sentence as a whole. (4.13 a) If the light is red, stop. Otherwise go straight on. (4.13 b) If the light is red, stop, or go straight on. We saw further evidence in Chapter 3, where we considered the range of semantic mechanisms underlying the predicate argument structure and interpretation of S-modifying adverbials. We saw that we could distinguish discourse adverbials and clausal adverbials according to whether or not their interpretation depended on an abstract object in the prior discourse. We saw that discourse adverbials contain semantic arguments instantiated as explicit discourse deictic reference to abstract objects, demonstrative NP reference to abstract objects, comparative abstract objects, relational abstract objects, etc. Because all of these arguments are anaphoric to abstract objects in the prior discourse, the discourse adverbials containing them function semantically as discourse connectives. 10 See Chapter 2 for an example of how this inference is modeled (in [LA93]’s discourse model). 175
    • Moreover, the anaphoric argument involved in the relation supplied by an adverbial may or may not resolve to the prior argument involved in the structural relation. In fact, a property of structural connectives is that they do not allow crossing of predicate-argument dependencies. For example, while (4.14) [WJSK03, 5] is interpretable as an embedded if S1, S2 construction within an although S1’, S2’ construction, crossing these dependencies as in (4.15) [WJSK03, 5] is either uninterpretable or at least yields a different interpretation; the dependencies in the original constructions are lost. a. if you need some money, you only have to ask him for it - d. he’s very hard to find. a. Although John is very generous - b. if you need some money - d. he’s very hard to find- c. (4.15) b. c. (4.14) Although John is very generous - you only have to ask him for it. It appears however that discourse adverbials do allow crossing dependencies, as shown in (4.16) [WJSK03, 6]. For then in (d) to get its first argument from (b), it must cross the structural connection F@ k ji 6GF between the clauses in (c) and (d) that are related by . Of course, as [WJSK03, 6] note, € anaphora frequently show crossing dependencies (e.g. John told Mike he would meet him later. g w g w [WJSK03] show that modelling these discourse relations structurally would create a directed acyclic graph, which goes beyond the computational power of LTAG and, moreoever, creates a completely unconstrained model of discourse structure. a. b. So he ordered three cases of the ’97. c. But he had to cancel the order d. (4.16) John loves Barolo. because then he discovered he was broke. It remains to be shown however whether or not all discourse adverbials allow crossing dependencies; the fact that the felicity of constructed examples can be difficult to determine has led   [FMP 01] to outline a more empirically-based approach to modeling discourse connectives. As 176
    • their goal is to build the most computationally efficient discourse parser possible, they argue that predicate argument dependencies should be defined structurally whenever possible, regardless of the compositional semantics of the predicate. In other words, the decision to treat a discourse connective anaphorically would be based entirely on whether corpus annotation indicates it to be necessary to avoid modeling crossing dependencies structurally, rather than on their compositional semantics or on constructed examples. This approach may help distinguish when compositional semantics determines how an adverbial retrieves its prior argument and when it does not. Certain discourse adverbials, however, are already represented in DLTAG as taking both arguments structurally. In particular, DLTAG distinguishes parallel constructions, conveying disjunction (“either...or”), contrast (“on the one hand...on the other hand”), addition “not only...but also”), and concession (“admittedly...but”). Because of the interpreted inter-dependency of the discourse connectives in these constructions, they are modeled with initial trees, as in Figure 4.15. D ËS ËS Í Í Î Í Í Î SÍ Î Î SÍ Î Î       S      D On the one hand D D On the other hand D Figure 4.15: DLTAG Initial Tree for Adverbial Constructions However, these discourse adverbials do not always appear in these parallel constructions. The majority of them, more frequently the second in the pair ([WJSK03]), can also appear alone, as in (4.17), in which case the discourse adverbial takes the first auxiliary tree in Figure 4.14. (4.17) Mike likes ice cream. On the other hand, he hates milk. We have already discussed how the two structure building operations in (D)LTAG work, we now illustrate with the example in (4.18) the DLTAG trees they produce. We will address additional examples in subsequent sections. (4.18) On the one hand, I noticed the solitude in New Hampshire, and then I hoped we could stay. On the other hand, when I noticed the lack of multi-culturalism, I hoped we would leave. We represent clauses using the symbol for initial trees subscripted by the name of the main verb 177
    • ). In fact, one of the benefits of DLTAG is that it parses discourse on top of the clause 3¨ ¦ 'd%f¡ (e.g. level parse; thus each atomic clause is itself a complex tree11 . The derived tree for (4.18) is shown in Figure 4.16. D W ¤2 (49 ¡ × 3§a(83f¡ RQ''U ¡ ¤ ¦ Ö × 3 %2 3Q%2  R'' ¡ ÖS ‘‘ ‘ ‘ ‘ ‘ ’ ’ ’ ’’ ‘ S‘ ’ ’ S ‘ ùù ’ ’ ““ “ “ ““ “ “ ”” ”” “ S“ ” ” ” ” “ S“ ” ” ” ” ðð ð ð ð ð ð ð ð ñ ñ ñ ññ ð ð Sð ñ ñ ñ ñ ñ ñ ñ D D On the one hand D On the other hand and D D when then Figure 4.16: DLTAG Derived Tree for Example (4.18) The derivation tree for our example is shown in Figure 4.17. We represent initial trees with (= on the one/other hand), auxiliary trees with discourse , and substitution and adjunction addresses are represented as above 3 542 ) connective anchors as e.g.  !!$¡ discourse connective anchors as e.g. (i.e. with respect to the parent tree (e.g. (2.2)). The reader should refer to the above figures to view the elementary trees that correspond to each discourse connective. 37a(83¦f¡ ¤ & U(¤ 3Q%2  R'' ¡ ) Í Í Î Í 93gÎ 6$¡   ¢Î 3 Q%2  9T'U ¡ ““ “ “ ” ””   !“ f¡” ” (1.2) (2.2) (0) (2) (3) W ¤2 §4R ¡ (3) 3 542 (0) ) Figure 4.17: DLTAG Derivation Tree for Example (4.18) See [FMP 01] for details of the DLTAG parser. Ñ 11 178
    • 4.3.2 Syntax-Semantic Interfaces for Derived Trees In Chapter 2 we briefly discussed two similar TAG-based approaches to incremental discourse structure, one ([Web91]) which outlines “right frontier” constraints on discourse deixis reference, and another ((LDM - see [Pol96, vdB96]) which outlines how the “right frontier” and the discourse relation between discourse units account for constraints on the antecedents of NP anaphora in local discourse spans. Both of these approaches employ the same two operations ([Gar97b]) for combining incoming elementary discourse units (clauses), although their terminology varies, and both use a similar syntax semantic interface with respect to the derived tree. The operations, called attachment and adjunction in [Web91], and their effect on the semantic information at each node, are illustrated again respectively in Figure 4.18. Figure 4.18: Illustration of [Web91]’s Attachment and Adjunction Operations In the first tree for the attachment operation, two nodes have already combined (by root adjunction), and their semantic information has been combined in the parent node, e.g. “(1,2)”. Attachment of a third node to this parent node creates the second tree and causes the semantic information “(3)” associated with this incoming node to be incorporated into the semantic information of the parent node to which it attaches to yield “(1,2,3)”. Roughly, attachment, e.g. inclusion in an existing discourse segment, corresponds in both approaches to the semantic list relation. A discourse corresponding to the result of attachment is shown in (4.19). (4.19) I like summer and I like winter and I like autumn. In the first tree for the adjunction operation, again two nodes have already combined (by root adjunction), and their semantic information has been combined in the parent node, e.g. “(1,2)”. Adjunction of a third node to this parent node (root) creates a new discourse segment node (whose 179
    • daughters are the original parent and the incoming node) as shown in the second tree, and causes the semantic information of both children to be associated with this new node to yield “((1,2),3)”. Roughly, adjunction, e.g. creation of an embedded discourse segment, corresponds in both approaches to beginning a list relation, begining a temporal progression, a causal relation, etc. A discourse corresponding the result of adjunction is shown in (4.20), where the first two clauses will be embedded under a cause node. (4.20) John joined the soccer team and Mike joined the football team because they wanted to impress their fathers. In both approaches, these operations are constrained to apply to nodes on the right frontier (the smallest set of nodes containing the root such that whenever a node is on the right frontier, so is its rightmost child[Web91]). Where the approaches differ is with respect to upward percolation of semantic information as a result of adjunction at nodes other than the root. [Web91] does not provide formal details concerning the syntax semantic interface or explicitly address the issue of upward percolation, however her examples of adjunction at a leaf show upward percolation of semantic information after adjunction at a leaf, as shown in figure. In this tree, after adjunction of to , not € i only does their new parent node contain the information (b,c), but this information has percolated up to the root node, replacing (a,b) in the first tree with (a,(b,c)) in the second tree. Figure 4.19: Webber’s Adjunction at a Leaf In contrast, although LDM explictly defines how the adjunction operation combines the semantics at each child node in the new parent node, they do not address the need for upward percolation. [Gar97b] argues that this lack of upward percolation is a problem for LDM, because there is no obvious way to read semantics off trees and there is no way to retrieve the available antecedents of 180
    • discourse deixis off the right frontier of the tree, because the necessary semantic information is not made available there. Gardent illustrates this using [Web91]’s example, first presented in Chapter 2 and repeated in (2.41)-(2.42). As Webber notes, the discourse deictic reference in (2.42) is ambiguous; it can refer to any of the nodes on the right frontier of the (derived) tree: (the nodes associated with) clause (2.41e), clauses (2.41d)-(2.41e), clauses (2.41c)-(2.41e), clauses (2.41a)-(2.41e). (2.41a) It’s always been presumed that (2.41b) when the glaciers receded (2.41c) the area got very hot. (2.41d) The Folsum men couldn’t adapt, and (2.41e) they died out. (2.42) That’s what’s supposed to have happened. It’s the textbook dogma. But it’s wrong. Under LDM’s approach, the tree for (2.41) is shown in Figure 4.20. Neither the root nor the right frontier describes the semantics of this discourse correctly. For example, what we interpret as presumed is not only , but all of - , and what we interpret when relating is not and , but and i F € F € € € F i proposition denoted by when(b,c)12 . ˜Gˆ1˜ j † € - . Moreover, resolution of that in (2.42) to - will incorrectly yield as a semantics for the presume(a, b) ú ú û û when(b, c) ú ú û û cause(c, d) c cause(d, e) — ˜˜ b ú ú û û a d e Figure 4.20: Derived Tree for Example (2.41) [Gar97b] proposes DTAG, a grammar that uses modified versions of the syntactic tree-construction operations and feature structures in feature-based TAG (FTAG) (see [Gro99, Shi86]) along with LDM’s semantics and lexicon to solve LDM’s problems arising from lack of upward percolation. 12 Lack of percolation is not a problem for root adjunction because it retains the semantic information of its children. 181
    • FTAG is identical to LTAG with respect to its structure building operations. However FTAG associates each tree node , except for substitution nodes, with a top ( ) and bottom ( ) feature ˜ € ‡ to its super-tree and bottom features capture the ‡ ‡ relation of  structure. Top features capture the relation of to its sub-tree. Substitution nodes have only top features since the tree substituting in logically carries the bottom features [Gro99, 8]. As shown in Figure 4.21, after substitution, the top features of the substition node unify with the top features of the substituted node, and the bottom features are provided by the substituted node. As defined in [JVS99], unification created the union (U) of the specified features and the replacement of any feature variables with features that contain values for these variables. Once processing at a node is complete, its top and bottom features unify. ] Z Y U Ž Y [ ü€ ý gWü ý wW˜ wW˜ Ë ú ú û û €ü ý ŽgW ü ý gW˜  Z X gW˜ X u +  Í Í Î Í Î Î Y Figure 4.21: Substitution in FTAG As exemplified in Figure 4.22, after adjunction, the node in X being adjoined into splits, and its top features unify with the top features of the root adjoining node, while its bottom features unify with the bottom features of the foot adjoining node. Two sets of features allow the semantic relationships between X and its sub-tree and super-tree to be maintained after adjunction. Again, once processing at a node is complete, its top and bottom features are unified. Y Figure 4.22: Adjunction in FTAG 182 U € B Y Ž §™ W Y U A þ€ ÿ w Wþ ÿ ™W˜  Í Í € Î Î Ž g WÍ Î wW˜ B X gW˜ € Ž™W ™W˜ Ì  •• € –– ŽgW • – gW˜  € þ ÿ ŽwW þ ÿ wW˜  Y u úú ú û ûû A Y  Í Í Î Í Î Î + X
    • In [Gro99]’s description, features capture how predicates constrain or assign semantic attributes such as case, agreement, number etc, of the lexical items they take as arguments. For example, prepositions assign accusative case to their substituted internal NP arguments. In a PP elementary tree, this assignment is explicitly stated by an “assign case feature” [Gro99, 24], because NPs are given a default case feature value. Without this feature, feature-unification after substitution would not alter the default value of the case features of a substituted NP. DTAG uses feature structures and operations to allow upward percolation of semantic information. DTAG contains 1) a set of “discourse basic” (B) trees, which are single-node trees implementing the elementary discourse units (DCUs) and their typed feature structures in LDM, 2) a set of “discourse rule” (R) trees, each of which implements a discourse grammar rule of LDM, and 3) operations for combining them. Each node of a DTAG tree is associated with two sets of feature structures modeled after those in FTAG. Recall the LDM typed feature structure of the DCU John smiled repeated in Figure 4.23, where represents the semantics of John smiled13 . | {z q'G@ } 8ƒƒ‚ € ~ iC@ 9j ~ SEM SCHEMA | {z G'q@ „ G'q@ | {z basic represents DCU type and Figure 4.23: LDM Elementary DCU In DTAG, this feature structure is identified with both the top and bottom features of an elemen- } ‚ „„ | {z G'q@ € iC @ 9gqj  SEM | {z G'q@ €} iC @ 9gqj  BOTTOM „ } ~~ SEM ‚ TOP ƒ ƒƒ ƒƒ 8ƒƒ‚ tary DCU, as shown in Figure 4.24. ~~ ~~ ~ Figure 4.24: DTAG Elementary DCU 13 The value of the SCHEMA feature is identical to the SEM feature in LDM elementary DCUs. 183
    • Recall further the LDM list grammar rule from Chapter 2 shown again in Figure 4.2514 , which states that any two discourse trees can combine to form a new tree of type list that represents a list relation (indicated by SEM). The value of an additional SCHEMA feature is required to be a non-trivial generalization over the meaning of the two trees. } F † g j‹•Y"i ‡ g •F ‹"‡ SEM SCHEMA „ FF o˜ gppE ~ ~ } 8ƒƒ‚ wˆj‹•Y"i ‡ F † w ‡ •F „ u FF o˜ gppE ~ SEM SCHEMA 8ƒƒ‚ ~ } 8ƒƒ‚ | g "7Bˆ"RŠq” •F ‡‰ w •F ‡z ‡ F „ g‹7‰ w "ŒRg5DA | g • F ‡ •F ‡z˜ @C ˜@C g5DA ~ SEM SCHEMA , ~ Figure 4.25: LDM List Rule Again, in DTAG each node is expanded to contain both top and bottom features, thereby producing the generalized DTAG R tree version shown in Figure 4.2615 . In this tree, R is instantiated by the discourse relation holding between daughters, (e.g. list, cause, etc.). As shown, the bottom features of the parent node is the conjunction of the application of R to the bottom features of the children nodes (A and B) with top features (TA and TB) of the children nodes. } TB ï î î „ ~ ï î î ð ð ð ð ð }ð ð ñ ‚ƒ ñ ñ ñ ñ ñ ñ t î t SEM TA TOP SEM TB SEM A BOTTOM SEM B ï TA „ R(A,B) ‚ƒ SEM ï î ï ~ } BOTTOM T ï TOP SEM „ BOTTOM 8ƒ‚ TOP ~ Figure 4.26: DTAG R Tree Gardent states that her version of the FTAG substitution operation, called -substitution, is E   unchanged except that it is restricted to the substitution of any tree into the right leaf of any tree. Her illustration of the -substitution operation is shown in Figure 4.27. E DTAG’s version of the FTAG adjunction operation, called -adjunction, is also unchanged, E except that it is limited to the right frontier and is no longer limited to “recursive structures” (e.g. 14 These grammar rules are simplified; see [Pol96] for additional information that is contained in a complete LDM feature structure. 15 Gardent does not include the SCHEMA feature. 184
    • optional modification structures whose root and foot nodes are identical). Her illustration of the - E adjunction operation is shown in Figure 4.28. Although not shown in the figure, [Gar97b, 12] states that “on adjoining, the tree dominated by the leftmost daughter of the local tree being adjoined is ‘closed-off’ in that all its and categories are unified (which in effect prevents any adjunction to € ˜ this subtree)”. This means nodes not on the right frontier are not available for processing16 . Figure 4.27: [Gar97b]’s -Substitution E Figure 4.28: [Gar97b]’s -Adjunction E As illustration, consider [Gar97b]’s explanation of the derivations of the discourse in (4.21). For clarity, she removes bracketing and feature structure labeling. (4.21 a) Dick does not come to work (4.21 b) because the trains aren’t running (4.21 c) and buses aren’t either. Gardent asserts that the DCU for (4.21b) can be -substituted into the causal R tree. This tree E is then -adjoined to the DCU for (4.21 a), making it unavailable for further processing, and its E Gardent also argues that the attachment operation can be modeled using -adjunction. ê 16 185
    • features unify. The result is shown in the first tree in Figure 4.29, where and represent the j € eventualities described by (4.21a) and (4.21b) respectively. The elementary DCU for (4.21c) is then -substituted into the list R tree. This tree is then -adjoined to the (b) node of the first tree, making E E it unavailable for further processing and its features unify, yielding the second tree in Figure 4.29. When processing is complete, all remaining top and bottom features unify. The semantics of the a TB ‘‘ ‘  ’ ’’  t t‘ ’ a a TB list(b,c) b TC c Ž  Ð t Ð Ï Ï Ž b b TC t t Ž TB b Ž TB T cause(a,b) Ž Ž Ž t  t Ð Ð a a a c. u t T cause(a,b) b  list(b,c) t a t root becomes: cause(a,b) Ï Ï   Ž  Figure 4.29: First DTAG Derivation of Example (4.21) and together cause . j i There is another interpretation of (4.21), however, namely that both € Gardent argues that this interpretation can be derived if (4.21 c) first -substitutes into the list R tree, E which then -adjoins to (4.21 b), yielding the first tree in Figure 4.30. This tree then -substitutes E E into causal R tree. The result is then -adjoined to (4.21 a), yielding the second tree in Figure 4.30. E When processing is complete, all remaining top and bottom features unify. The semantics of the ‘‘ ‘  ’ ’’  t ‘ t’ T1 list(b,c) b Ï Ï Ž b b  Ð t Ð a a a TC c Ž t u t t Ž Ž  Ð t Ð Ï Ï Ž  Figure 4.30: Second DTAG Derivation of Example (4.21) 186 T1 TC Ž TC)) t b Ž T0 cause(a,(list(b,c) c. t b t list(b,c) Ž  t TC c a t b b TC c)) t T1 list(b,c) b b t root becomes: cause(a,(list(b,c)  
    • In its essence, DTAG presents a viable approach to obtaining the compositional semantics of a discourse from its discourse tree. However, it is not clear that the feature structures in Figures 4.29 and 4.30 have been constructed according to the definitions of -substitution and -adjunction. E E “Irrelevant information” has been omitted from these figures[Gar97b, 15]; if we step through each stage, however, it is not clear that we would yield the same feature structures. Consider again the second derivation of (4.21). To avoid conflating the identities of the TB variables in the two R trees involved in this derivation, we will use TX in one tree, and TZ in the other tree. In the first step, the DCU for (4.21 c) -substitutes into the right daughter of the list R tree, which by definition of E -substitution produces the union of the top and bottom features as shown in Figure 4.31.   •• • – –– t Ž TX X TC U c CUc Ž t Ž  TC C TC t T1 list(X,C) TX Ž  u Ž ú ú û û t Ž Ž TX X + c TC c  T1 list(X,C) TX  E  Figure 4.31: Step One in the Second DTAG Derivation of Example (4.21) In the second step, the list R tree -adjoins to the DCU for (4.21 b), which by definition of E -adjunction produces the union of the top features of the DCU for (4.21 b) with the top features of E the parent node of the list R tree, and the union of the bottom features of the DCU for (4.21 b) with  Í Í Í Î Î Î t TC U c CUc Ž TX bUX TC Ž  Ž  Ž t •• TC U c CUc t b U T1 list(X,C) TX Ž u Ž  – •t – – Ž TX X + b TC b R tree, as shown in Figure 4.32.  T1 list(X,C) TX ˜@C rA the bottom features of the left daughter node of the   Figure 4.32: Step Two in the Second DTAG Derivation of Example (4.21) The result is not identical to the first tree in Figure 4.30, which is Gardent’s representation of this stage. Part of the difference is due to the fact that the leaf corresponding to (4.21 b) has been pushed off the right frontier, so according to the definition for -adjunction, its top and bottom E 187
    • features unify. The result is that TB and B are instantiated as the single available value, . We can € then replace all other instances of these variables with , producing the tree in Figure 4.33. € b U T1 list(b,C) b TC t Ž   •• • – –– t TC U c CUc Ž  Ž b b Figure 4.33: Step Three in the Second DTAG Derivation of Example (4.21) This tree is still not identical to Gardent’s version. One reason is that we have not fully unified the rightmost leaf, because it is not clear what Gardent has done. She has replaced C with , but i i she has not replaced TC with . Both TC and C are variables and is a value, and all are of like i type (e.g. BOTTOM-SEM and TOP-SEM, respectively). We will thus complete the derivation without resolving variables after the union of features at right frontier nodes, and see what results. Another difference is that Gardent has not shown the union “b U T1” in the parent node, as would adjunction. Gardent does say that -adjunction to a leaf is a special case, but this E E be expected by union would also be produced after adjunction at other nodes. We will thus include this union. In step four the list tree -substitutes into the right-most leaf of a cause R tree, producing the E union of the top and bottom features, as shown in the third tree in Figure 4.34. Note that in Gardent’s version (the second tree in Figure 4.30), Gardent has identified Z with list(b,c) b TC, and she has t t also identified TZ with T1, in contrast to above, where she did not identify TC with .   ‘ ‘ ‘ ‘ ’ ’ ’ ’ t TZ U b U T1 Z U list(b,C) b TC   •• –– t t • – Ž Ž Ž t   •• • – –– t Ž Ž  Ž t Ž t  Ï Ï Ð Ð Ž  b b TC U c CUc Ž TA A Ž t TC U c CUc T0 cause(A,Z) TA TZ Ž b b u TZ Z  TA A b U T1 list(b,C) b TC i + T0 cause(A,Z) TA TZ Figure 4.34: Step Four in the Second DTAG Derivation of Example (4.21) In step five, the resulting cause tree -adjoins to the DCU for (4.21 a), yielding the third tree in E 188 
    • Figure 4.35. TZ TZ U b U T1 Z U list(b,C) b TC   •• –– t t • – TC U c CUc Ž Ž Ž  Ž Ž Ž ‘ ‘ ‘  ’ ’ ’  t t‘ ’   •• –– t t • – b b Ž  ‘‘ ‘ ‘ ‘ ’ ’ ’ ’’ t Ž TC U c CUc TA aUA t  Ž b b a U T0 cause(A,Z) TA Ž  TZ U b U T1 Z U list(b,C) b TC u TA A  + a a T0 cause(A,Z) TA TZ Ž Figure 4.35: Step Five in the Second DTAG Derivation of Example (4.21) Now the leaf corresponding to (4.21 a) has been pushed off the right frontier, so by -adjunction, E its variables are instantiated as , as are all instances of TA and A, producing the tree in Figure 4.36.   ‘ ‘ ‘ ‘ ’ ’ ’ ’ t TZ U b U T1 Z U list(b,C) b TC TC U c CUc Ž   •• –– t t • – Ž Ž b b Ž t  j a a TZ Ž a U T0 cause(a,Z) a Figure 4.36: Step Six in the Second DTAG Derivation of Example (4.21) At this point there is no more input and so all top and bottom features unify. The problem is that in most nodes containing variables we no longer have a single value with which they can be identified. We do have a single value in our rightmost leaf, so we unify TC and C with , and then i identify all other instances of TC and C with , producing the tree in Figure 4.37. i Again, however, this tree does not correspond to Gardent’s intended final derivation for (4.21). In order to reproduce her intended derivation, we have to understand why top features of elementary DCUs are not always identified with top features in the substituted node after substitution, and we have to understand why top features of nodes being adjoined to are not unified with top features in the parent node of the adjoining tree. As stated above, [Gar97b, 12] does assert that adjunction to a leaf is a “special case”, but she does not define it, and the same problems would arise after 189
    • adjunction at other nodes on the right frontier. ‘‘ ‘  ’ ’’  t t‘ ’ TZ U b U T1 Z U list(b,c) b c c c Ž  ÷÷ øø b b Ž t t  a a TZ Ž a U T0 cause(a,Z) a Ž  Ž Figure 4.37: Step Seven in the Second DTAG Derivation of Example (4.21) One way to reproduce Gardent’s final derivation would be to explicitly state how features “unify”after adjunction and substitution. We would require that in substitution, the bottom features of the parent node are replaced by the bottom features of the substituting node, but the top features of the substituting node are replaced by the top features of the incoming node. In Figure ¡ ˜ and ˜ u ¡ € u  4.27, this would be . We would require that in adjunction, the bottom features € of the left daughter of the adjoining tree are replaced by the bottom features of the tree being adjoined to, but top features of the adjoining tree replace top features of the tree being adjoined to. In ˜ u ˜ and ¡  € ¢ u £ Figure 4.28, this would be . Alternatively, we might define substitution nodes € as having only top features as in FTAG, and elementary DCUs as having only bottom features; we would then only have to define how adjunction affected feature unification. 4.3.3 A Syntax-Semantic Interface for DLTAG Derivation Trees Because in DLTAG, discourse connectives are predicates, both syntactically and semantically, DLTAG can build both the syntax and the compositional semantics of these predicates using the same syntactic and semantic mechanisms that are used to build the syntax and compositional semantics of predicates at the clause level. As discussed above, DLTAG currently uses the LTAG grammar to build discourse syntax. As such, in this section we explore how the [JKR03, Kal02, KJ99, JVS99] syntax-semantic interface for LTAG, which employs a flat semantic representation using ideas from Minimal Recursion Semantics 190
    • ([CFS97]) and Hole Semantics ([CFS97, Bos95]), can also apply to DLTAG. This section will be exploratory rather than conclusive, as a complete syntax-semantic interface incorporating all aspects of DLTAG and discourse connectives requires a thesis in its own right. As introduced in Section 4.2, [JKR03, Kal02, KJ99, JVS99] argue that producing a compositional semantics for LTAG should correspond to establishing semantic predicate-argument relationships using the LTAG derivation tree, because it is this tree, rather than the derived tree, that reflects these relationships. To illustrate the basics of the extension of their interface to DLTAG, we begin with a few simple examples of two-sentence discourses connected by structural discourse connectives. We will then consider more complex discourses and discuss why [JKR03, KJ99]’s use of multipcomponent TAGs and/or [Kal02]’s enriched derivation tree is sometimes needed to build the compositional semantics of discourse. Consider first (4.22), where two clauses are linked by the subordinating conjunction because. (4.22) John likes Mary because she walks Fido. The elementary DLTAG trees for (4.22) are shown in Figure 4.38. SC represents subordinating conjunction. Although the internal structure of clauses is accessible in the DLTAG parse   ([FMP 01]), in all of our examples in this section we model atomic clauses with elementary initial ). As discussed above, D is a generic discourse unit root label used to represent S 3¨ ¦ 'd%f¡ trees (e.g. both atomic clauses and constituents composed of clauses and predicates. 3¨ ¦ 'd%f¡ D D ©¨§¥£¡ ¦¤ ¢ ËS ËS ‘ ‘ ‘ ’ ’ S‘ ’ ’ SC S D D S D because Figure 4.38: DLTAG Elementary Trees for Example (4.22) Figure 4.39 shows the semantic representations of the elementary trees in Figure 4.38. In these and all of our following semantic representations we will employ the semantics outlined in [JKR03, AA iGj now be associated with the semantics of %A Kal02, KJ99]. In our discourse-level extension, however, their propositional labels, e.g. , will discourse units, both atomic and complex, and their 191
    • propositional variables, e.g. , will now be used as discourse unit variables, whose values are these %@ labels. Furthermore, we will employ a simplified representation of clause semantics, akin to the treatment of NPs in [Kal02], in which the values of the clause arguments are already provided. For for the semantic value of the clause John likes Mary. 3¨ ¦ !'d%f¡ l : like’(j, m) ————— arg: – l : walk’(m, f) ————— arg: – g , and  ¦¤ ¢ "©¨§¥£¡ 3 ¨ ¦ 3 S¤Q3I 'd%f¡ 5B(95Pf¡ t g Û t w w Û Figure 4.39: Semantic Representation of ™ g w 3 S¤Q3I 9B§RP$¡ | • ‰ {z¦ F ä CA ¤ Å1p'P§y5Dp¥g A l : because’(s , s ) —————————— arg: s ,(1) , s ,(3)  ¦¤ ¢ "8¨76$¡ example, we will use As shown, because takes two substituted arguments that are associated with positions in its elementary tree. Figure 4.40 shows the derived and derivation trees that result from substituting into these positions in 3 S¤Q3I 5B§R5Pf¡ 8¨76$¡ ¦¤ ¢ and , along with the semantic representation that results. 3¨ ¦ 'd%f¡ D 8¨¦§¤¥$¡ ¢ 3¨ ¦ V'8%$¡ • • – – 3  S ¤• Q 3 I 5B(95P– f¡ S ‘‘ ‘ ‘ ’ ’’ S‘ ’ ’ D SC D 3¨ ¦ 'd%f¡ because ©¨§¥£¡ ¦¤ ¢ (1) (3) S l : because’(l , l ), l : like’(j, m), l : walk’(m, f) ———————————————————– arg: – g ™ ™ g w Figure 4.40: DLTAG Derived and Derivation Trees and Semantic Representation for (4.22) As discussed in Section 4.2, combining semantic representations in this approach consists of building the union of the semantic representations of the elementary trees involved in the derivation and assigning values to argument variables, where the derivation tree indicates how argument variables get values: when a tree is substituted, its value is applied to the argument variable paired with the position at which it attaches. For example, the derivation tree in Figure 4.40 indicates that the 3¨ ¦ 'd%f¡ s argument variable in the because elementary tree should be assigned the label of because w substitutes in at position (1) and position (1) is associated with the s argument variable. In w g 192 8¨76$¡ ¦¤ ¢ 3¨ ¦ 'd%f¡ the same way, the derivation tree indicates that s should be assigned the label of . The flat
    • semantics shown in Figure 4.40 leads to the embedded semantic representation in (4.23). like’(m, f) t t (4.23) like’(j, m) because’(like’(j, m), walk’(m, f)) The analysis of a discourse containing the coordinating conjunction and and two clausal arguments proceeds similarly17 . Consider for example the discourse in (4.24). (4.24) John saw Mary and he kissed her. The elementary DLTAG trees for (4.24) are shown in Figure 4.41. CC indicates coordinating conjunction. This analysis also applies to the empty connective, , as discussed above. H Ë Ì S S Í Í Î SÍ Î Î D 33 55 ¡  % ¨ 'Vf¡ CC D S D D S D and Figure 4.41: DLTAG Elementary Trees for Example (4.24) Figure 4.42 shows the semantic representations of the elementary trees in Figure 4.41. Only the substituted argument of and (s ) is linked to an address; as discussed in Section 4.2, in a substitution g will get the value of the substituted element, € step at position , the argument variable linked to € and in an adjunction step, the argument variable of the incoming tree is added to the representation and assigned the value at the position where it attaches. 33 55 ¡ % ¨ 4V$¡ l : see’(j, m) ————— arg: – l : kiss’(j, m) ————— arg: – g ) t w w g Û w to 33 55 ¡ & U(¤ ) and adjoining % ¨ 54Vf¡ & 33 9 ¡ U(¤ g Figure 4.43 shows the derived and derivation trees that result from substituting into ) and & U(¤ , % ¨ 54Vf¡ ) Figure 4.42: Semantic Representation of ™ & §¤ l : and’(s , s ) ———————— arg: s , s ,(3) , along with the semantic representation that results. Again, the derivation tree in Figure 4.43 indicates that the value of the s argument variable in the and elementary tree g % ¨ 54V$¡ 17 , and further indicates that the s argument variable in the w is assigned the semantic label of As discussed above, so, but take an initial tree, and are analysed like because. 193
    • 33 5 ¡ and elementary tree is added to the representation and assigned the semantic label of . This flat semantic representation leads to the embedded semantic representation shown in (4.25). kiss’(j, m) and’(like’(j, m), kiss’(j, m)) t t (4.25) see’(j, m) D S S ÍÍ Í Î ÎÎ SÍ Î (0) % ¨ 47f¡ % ¨ 54Vf¡ 33 9 ¡ and ) D & U(¤ SC 33 9 ¡ D (3) l : and’(l , l ), l : see’(j, m), l : kiss’(j, m) —————————————————— arg: – g ™ ™ g w Figure 4.43: DLTAG Derived and Derivation Trees and Semantic Representation for (4.24) Consider now the more complex discourse in (4.26). The arguments of because are italicized, and the arguments of and are bracketed. To make it clear that these are the interpreted semantic arguments of each connective, we include a contextual question. Figure 4.44 shows the derived and to the of the € & §¤ ) , and adjoining ˜„„ UUo & U(¤ ) into & ¤ h % §5e14$¡ 3 S¤Q3I 9B§RP$¡ (Who is happy and who is sad?) (4.26) F@ k ji B¥–gF , substituting ¿ ¤  h % W (¿ Y('$¡ into and & % 0 UTf¡ derivation trees and the semantic representation for (4.26) after substituting tree. [Mary is happy because her husband found a job] and [John is sad]. D “ “ “ “ S     ú  S ú      CC S ” ” ” S ö “ö ” ”  D ¿ ¤  h % W (¿ Y14$¡ because and (3) (0) & % 0 UTf¡ & ¤ h % §5e('$¡ SC (1) & ¤ h % 79e(4£¡ D D ¿ ¤  h % &§¤ & % 0 UTf¡ W §¿ Y(4£¡  )      3  S ¤¨ ¨ Q 3 I 5B§R5Pf¡   D (3) S l : is-happy’(m), l : find’(h, j), l : is-sad’(j), l : because’(l , l ), l : and’(l , l ) ————————————————————————————————arg: – © w g ç ™ w © ç ™ Figure 4.44: DLTAG Derived and Derivation Trees and Semantic Representation for (4.26) This flat semantic representation leads to the embedded semantic representation shown in (4.27). 194
    • This is the expected interpretation; by adjoining to the root of the because tree, the value of the and adjunction argument variable is the label corresponding to the entire because tree. find’(h, j) is-sad’(j) t (4.27) is-happy’(m) t because’(is-happy’(m), find’(h, j)) and’(because(is-happy’(m), find’(h, j)), is-sad’(j)) t t But now consider the similar discourse in (4.28). Again, the arguments of because are italicized, the arguments of and are bracketed, and to make it clear that these are the interpreted semantic arguments of each connective, we include a contextual question. Figure 4.45 shows the derived and 3¨ ¦ 'd%f¡ ) of the result, and substituting into & U(¤ & % T0 ¡ to the leaf ( ¿ ¤  h % W (¿ Y('$¡ ) & §¤ 3 S¤Q3I 9B§RP$¡ (4.28) , adjoining ) into and & % 0 UTf¡ derivation trees and the semantic representation for (4.28) after substituting . (Why is Mary happy?) Mary is happy because [her husband found a job] and [he likes it]. ¿ ¤  h % W §¿ Y(4£¡ S S ÍÍ Í Î ÎÎ SÍ Î S       S        SC D CC & % T0 ¡ and D (1) 3¨ ¦ V'8%$¡ because D 3¨ ¦ 'd%f¡ D & U(¤ ) ¿ ¤  h % &T%0 ¡  W (¿ gY14$¡ Í Í Î Î 3 S Q3I 9(Rͤ PÎ$¡ D (3) (0) (3) l : is-happy’(m), l : find’(h, j), l : like’(h, j), l : because’(l , l ), l : and’(l , l ) ————————————————————————————————arg: – © ç g ç ™ w ç © ™ Figure 4.45: DLTAG Derived and Derivation Trees and Semantic Representation for (4.28) This flat semantic representation leads to the embedded semantic representation shown in (4.29). like’(h, j) t because’(is-happy’(m), find’(h, j)) t find’(h, j) t (4.29) is-happy’(m) and’((find(h, j)), like’(h, j)) t Assuming that Mary being happy is caused by both her husband finding a job and liking it, this is not the interpretation produced by this approach, wholly because the and adjunction is not retrieved from the argument position (3) of because in the derivation tree18 . 18 Note that the problem becomes enormously more complex when we consider the current DLTAG parser’s “lowest adjunction” default for the empty connective. See [FMP 01] for details about this default. Ñ 195
    • In fact, the same problem arises in clause-level conjunction. Consider for example (4.30). (4.30) John likes Mary and Sue.  tree to ›o Gj E ‡ …–j In (4.30), the unit Mary and Sue is built by adjunction of an , and the substi- tution of Sue into the and tree. The and tree here is identical to auxiliary and tree in DLTAG, except that its root, substitution and adjunction nodes are all labeled NP. This tree, along with the LTAG derived and derivation trees for (4.30), are shown in Figure 4.46. As shown, the compositional S V Ë Ì Í Í Î Í Î Î likes (1) NP NP CC NP Mary and (2.2) (0) Sue 3S BR ¡ John VP ÍÍ Í  Î ÎÎ Í Î NP ÍÍ Í ÎÎ Í Î Î NP ÍÍ Í ÎÎ Í Î Î and is what substitutes into address (2.2). &§¤  )  "!7¡ W!`(¤ ¡ ¡ Í Í Î 3 ¦ Tͨ 8%$Ρ Î CC ¡ NP ¡ NP W` §¤ semantics of the verb is likes(j,m), because (3) Figure 4.46: LTAG Derived and Derivation Trees for Example (4.30) E ‡ …Gj One solution at the clause level is to make the NP-conjoining tree into an initial tree, but although this would solve the problem in (4.30), it is not only NP conjunction that yields this type of problem; any adjunction to a substituted element will not be reflected in these compositional semantics19 . This solution could also be applied at the discourse level, e.g. if DLTAG models and (and ) with initial trees; then the and tree in example (4.28) could substitute into position (3) in the H because tree, and produce the desired interpretation. There is more general technical solution proposed in [Kal02] to deal with scope in French quantifiers, which have been analyzed as adjuncts that adjoin to an NP, as illustrated in Figure 4.47. We overlook details of the semantics of French quantifiers here; roughly stated, the quantifier semantics 19 This solution for NP coordination was suggested separately by Laura Kallmeyer and Aravind Joshi, personal communications. As Aravind Joshi notes, it is natural for NP coordination to be represented in LTAG with an elementary tree because any NP tree in LTAG has the (root) label NP; however, VP, whose coordination can also be viewed as S coordination in which two subject nodes are identified, does not exist in LTAG as an elementary tree whose root label is the same as an elementary tree that could conjoin it; rather, the root label of the VP tree is S. He suggests instead that a VP coordination tree be constructed dynamically during parsing. 196
    • requires access to the verb semantics, and vice versa, but there is no link (edge) in the derivation tree between them20 . × Ö Ö × Ì chaque N S VP N ú ú û û chien S Ë ÷÷ øø N N + u DET + Í Í Î Í Î Î N , aboie VP chien(1) chaque(0) V DET N V aboie chaque chien aboie Figure 4.47: Quantifiers in French [Kal02, 104]’s solution to this problem is to enrich the derivation tree with an additional edge between the quantifier and the VP, as shown in Figure 4.48. aboie chien(1) chaque(0) Figure 4.48: [Kal02]’s -Edges for Quantifiers in French F This additional link, it is argued, makes explicit the intuitive dependency relationships we can already see in the derivation tree between the VP tree into which chien substitutes, and elements that adjoin to chien. In other words, [Kal02] proposes that in the case of an adjunction at the root node (in this case chien), the adjoined tree (in this case chaque) is not only linked by an edge to , but is also linked by an additional edge to the tree to which ¡ ¡ of some elementary tree was added in ¡ some previous derivation step, which is in this case the tree for aboie. [Kal02] notes that in fact this ‡ top features in the parent tree with the top features of ‡ union of top features (see Figure 4.21)) followed by adjunction at ‡ additional edge reflects the unification of features after substitution of an node (which creates the (which creates the union of the (see Figure 4.22)). More generally, [Kal02] proposes the following definition of the e-derivation graph (it is no longer a tree): all edges in the derivation tree are primary e-derivation edges. Furthermore, there 20 Translation: chaque = each; chien=dog; aboie=barks 197
    • is a daughter of ’ with position 0 n) and ) Û i   with position 0 (1 ) ª % ) ª is a daughter of w w  % )  ) w ) F ª (adjunction at the root node of ’), if, in the derivation tree, ª F such that ’ is a daughter of , ) ,... ª there are nodes ’, and ) is a secondary -derivation edge ( -edge) between two nodes = . ª This definition is sketched in Figure 4.49. ª ’ (0) ª ) w ... (0)  ) Figure 4.49: [Kal02]’s -Derivation Graph F [Kal02] then redefines how substitution and adjunction yield the semantic representation to incorporate the contribution of -edges. By her new definitions, a variable in the semantic repre- F sentation of a node can obtain its value either from a tree linked to it by a primary edge in the derivation tree, or from a tree linked to it by a secondary -edge in the -derivation structure. For F F example, a variable in the semantic representation of aboie can obtain its value from either chien or chaque. The only further requirement, obviously, is that the variable and the value it obtains are of like semantic type. Returning now to the DLTAG example (4.28), what adding the additional -edges effectively F does is create ambiguity as to the interpreted semantic argument of because, because all variables and values we have considered so far are of like semantic type (e.g. discourse units). Consider the -derivation graph for (4.28), shown in Figure 4.50 along with the derived tree. If we compute F semantics based on this -derivation structure, then in addition to the semantic representation of this F and l : ç discourse that we already discussed, shown in Figure 4.45 (where because takes l : ¿ ¤  h % W §¿ Y(4£¡ ™ as its arguments), there is another possible semantic representation for this discourse, shown & % T0 ¡ in Figure 4.51. This additional interpretation, which leads to the embedded semantic representation shown in (4.31), is the intended interpretation. It is produced by the additional -edge, which makes F 198
    • available the value of the and tree to the argument variable in the because tree. like’(h, j) t find’(h, j) and’(find’(h, j), like’(h, j)) t t (4.31) is-happy’(m) because’(is-happy’(m), and’(find’(h, j), like’(h, j))) ¿ ¤  h % W §¿ Y(4£¡ S S ÍÍ Í Î ÎÎ SÍ Î S       S        SC D CC & % T0 ¡ and D (1) 3¨ ¦ V'8%$¡ because D 3¨ ¦ 'd%f¡ D & U(¤ ) ¿ ¤  h % W (¿ gY14$¡ &T%0 ¡  ÍÍ Í Î ÎÎ 3 S Q3I 9(Rͤ PÎ$¡ t D (3) (0) (3) Figure 4.50: DLTAG Derived Tree and -Derivation Graph for Example (4.28) F l : is-happy’(m), l : find’(h, j), l : like’(h, j), l : because’(l , l ), l : and’(l , l ) ————————————————————————————————arg: – ç © g g ™ w © ç ™ Figure 4.51: Additional Semantic Representation for (4.28) due to -Derivation Graph F The question then becomes one of whether we want this ambiguity. We certainly want some ambiguity, the problem is that we may not want the first interpretation of (4.28) to be possible at all. That is, we don’t want to produce a representation for (4.28) where because takes l : ¿ ¤  h % W (¿ Y('$¡ ™ and l : © as its arguments, and and takes l : 3¨ ¦ V'8%$¡ & % T0 ¡ ç and l : , because it is not clear that & % 0 UTf¡ ç this is even a possible interpretation of of (4.28). Nor, perhaps, do we want the -edge in Figure F E ‡ …–j 4.50 to allow to resolve its adjunction variable to the interpretation of because. While doing so introduces a possible interpretation, in fact this is the same interpretation already achieved by adjunction of the and tree to the root of because, as we saw in (4.26). In other words, there appear to be only two possible interpretations for (4.28), one is achieved by adjunction of and to the root of the because tree, and the other is made available by the - F edge that allows because to take the and tree as its second argument. The third representation, as its second argument and and takes & % UT0 ¡ & % UT0 ¡ where because takes as its first argument, is not clearly interpretable. To prevent such unwanted interpretations arising from DLTAG -derivation F graphs, we could of course define constraints on the effect that -edges can have on the value of F 199
    • semantic argument variables whose values are discourse unit labels. As illustration, let us define two constraints that restrict the semantic values made available by DLTAG -edges: F Constraint (1): if there is an -edge between two elementary DLTAG trees that both take two F structural arguments (via substitution or adjunction) whose values are both discourse unit labels, then this -edge will determine the value of the argument of the higher elementary tree. F Constraint (2): the argument value produced by Constraint (1) is the only additional argument value made available by -edges between DLTAG elementary trees21 . F We briefly illustrate the effect of these constraints by considering the discourse in (4.32), where the intended arguments of because are bracketed. The derived tree and -derivation graph for this F example are shown in Figure 4.52. (4.32) [Mary is happy] because [her husband found a job and he likes it and it pays good money]. D because D CC and D D CC 3¨ ¦ 'd%f¡ and (1) D (3) (0) 3¨ ¦ 'd%f¡ SC (3) & U(¤ & % UT0 ¡ S S Í Í Î Î ¿ ¤  h % W (¿ Y('$¡ SÍ Î S “ “ “ “ ” ” ” S “ ڔ Ú ” S ðð ð ð ð ð ññ ð S ð ûñ û ñ ñ ñ ñ ñ D & U(¤ ) ¿ ¤  h % &T%0 ¡  W (¿ Y14$¡ Í Í Î 3 S Q3I 5B(Rͤ PÎf¡ Î D (0) W §¤ ¿ ¡ (3) ) W (¤ ¿ ¡ Figure 4.52: DLTAG Derived Tree and -Derivation Graph for Example (4.32) F As the derived tree already shows, and Constraint (1) requires, we want the higher and to take the value of the lower and as the value of its substitution argument variable, and we want because to take the value of the higher and as the value of its substitution variable. Constraint (2) would prevent the -edges from allowing any other semantic interpretations, such as the lower and taking F the value of the higher and as the value of its adjunction argument – because this can already be achieved by adjunction to the root of the higher and. Essentially, however, it should be obvious that 21 This cannot be stipulated at the clause level, for chaque needs access to the label of aboie. 200
    • combining [Kal02] e-derivation graphs with constraints that reduce the interpretations yielded by these graphs is an attempt to avoid a situation where the DLTAG derivation structure (tree or graph) produces the wrong result and thus is not a better basis for semantic interpretation than simply using the DLTAG derived tree in the first place. There is an alternative solution proposed in [JKR03], however, which allows us to derive all and only the intended semantic interpretation(s) from the original DLTAG derivation tree without requiring tempering constraints or additional -edges. This solution employs the notion of flexible F direction of composition22 , which we illustrate first using a context-free grammar (CFG) rule such u as A BC. To produce A, we can view B as a function and C as its argument, or we can view C as the function and B as its argument. Because CFGs provide string rewriting rules, in which function and argument are ‘string-adjacent’ strings, this use of flexible composition effects neither the weak generative capacity (set of strings generated) nor the strong generative capacity (set of derivation trees generated) of the CFG. A TAG, however, provides tree rewriting rules. Function and argument in a TAG are complex topological arguments (trees) that are ‘tree-adjacent’; thus, depending on how it is specified, the use of flexible composition in a TAG can potentially effect both its weak and strong generative capacity. [JKR03] specify the use of flexible composition in a TAG as follows: if a tree composes into a tree ˜ must be an elementary tree. If both and ˜ k k k can go either way. If both and k , are elementary trees, the direction of composition are derived trees, they cannot compose with each other. Roughly ˜ stated, this definition of flexible composition allows the derivation tree to be traversed in a flexible manner while ensuring ‘tree locality’: the derivation tree can be traversed starting at any node, but as traversal continues the growing complex (derived) tree can only compose into ’tree-adjacent’ elementary trees. To take a simple example, consider the construction of the clause Funny people smile, whose syntax and semantics can be produced through composition of the three elementary LTAG trees shown in Figure 4.53 along with the resulting derived and derivation trees. Note that although the derivation tree represents how these elementary trees combine, and the derived tree represents the 22 This discussion of flexible composition in TAG is derived from [JKR03, 8]. 201
    • resulting clause structure, neither of these trees show the traversal order in which these elementary trees were composed. By the TAG definition of flexible composition, we can for example first com- VP Ì V ADJP NP V smile funny people ) NP , . (1) WS B"0 × Ö Ö × funny VP ú ú û û NP S ©3¦ ¿ 53 ¿ ¡  Ë × Ö Ö × S u people Alternatively, , and then compose the resulting derived tree into Í Í Î Í Î Î NP NP + 23 . 83'% ¡  ¡ ¦ WS B0 ADJ + and ©3¦ ¿ 53 ¿ ¡  83¦ ¿ 3 ¿ ¡  NP and then compose the resulting derived tree into ©3'% ¡ f¡ ¦  ) 83'% ¡  ¡ ¦ we can first compose ) and WS 0 pose (0) smile Figure 4.53: Flexible Composition in LTAG Although these traversal orders are all possible by the definition of flexible composition in TAG, only the latter bottom-up traversal yield the desired semantics. As illustration, suppose the seman- funny’(people’(x)). How- will be identified with people’(x), yielding ©3T% ¡  ¡ ¦ WS 0 ) ©3¦ ¿ 53 ¿ ¡  funny’(people’(x)). Substituting the corresponding derived tree into will cause s › , then t with ) ever, if we first compose identifies WS B0 ©3'% ¡ f¡ ¦  with people’(x) too, yielding the clause semantics: smile’(people’(x)) , then › 83¦ ¿ 3 ¿ ¡  is identified with people’(x); subsequent composition of this derived tree with and 83'% ¡ f¡ ¦  is smile’(z). If we first compose is funny’(y), and s 83¦ ¿ 3 ¿ ¡  the semantics corresponding to ) is people’(x), the semantics corresponding to WS B0 tics corresponding to to be iden- tified with funny’(people’(x)), yielding the desired semantics: smile’(funny’(people’(x))). In this thesis, it is the impact of flexible composition on the computation of semantics from the DLTAG derivation tree that is our main concern. First, consider how flexible composition and an assumption of bottom-up traversal of the DLTAG derivation tree produces only the intended interpretation of (4.28), repeated below, where the intended arguments of because are bracketed. (4.28) (Why is Mary happy?) [Mary is happy] because [her husband found a job and he likes it]. The derived and derivation trees for this example are repeated in Figure 4.54. We have already seen how a top-down traversal of the derivation tree yields an unintended semantic representation 23 Allowing simultaneous composition is also necessary so as not to exclude standard TAG derivations [JKR03]. 202
    • for this clause (see Figure 4.45), Figure 4.54 shows the intended semantic representation for this example, which results, very simply, from a bottom-up traversal of the derivation tree. D ¿ ¤  h % W §¿ Y(4£¡ S S ÍÍ Í Î ÎÎ SÍ Î S       S        D CC D & % T0 ¡ and (1) 3¨ ¦ V'8%$¡ because D 3¨ ¦ 'd%f¡ SC & U(¤ ) ¿ ¤  h % &T%0 ¡  W (¿ gY14$¡ ÍÍ Í Î ÎÎ 3 S Q3I 9(Rͤ PÎ$¡ D (3) (0) (3) l : is-happy’(m), l : find’(h, j), l : like’(h, j), l : and’(l , l ) l : because’(l , l ) ————————————————————————————————arg: – w g ™ © ç w © ç ™ Figure 4.54: DLTAG Derived and Derivation Trees and Semantic Representation for (4.28) We will now consider the analysis of two-sentence discourses in which the second sentence is modified by a discourse adverbial. We will show that the use of flexible composition, which allows a bottom-up traversal for DLTAG derivation trees, is sometimes crucial to obtaining the intended semantic interpretation. We will further illustrate how [KJ99]’s enriched derivation tree could be used instead, but would yield greater semantic ambiguity. As discussed in Chapter 3, the distinction between clausal adverbials, which do not function semantically as discourse connective, and discourse adverbials, which do function semantically as discourse connectives, is derived from their predicate argument structure and interpretation. For example, in (4.33), the internal argument of the PP discourse adverbial is a demonstrative AO, this way, which refers to the interpretation of the prior sentence. (4.33) The company interviewed everyone. In this way, they considered all their options. Of course, the anaphoricity of demonstrative NP reference is not modeled structurally in (D)LTAG; as discussed in Chapter 3, the anaphoricity of pronouns can be represented semantically using as- F j relative to an assignment function , where is determined by a context ”. How- i ठ% š j F is mapped to is read as “x denotes an entity via an index that C signment functions[HK98]. In general, [[ ]] ever, definite nouns have been represented semantically using partial functions: the domain of the definite article contains only nouns that correspond to one (and only one) entity in the set of individ203
    • uals ([HK98]). Although a definite noun thus presupposes one and only one entity corresponding to its denotation, it may or may not be anaphoric in the sense that it refers to a salient entity in the context. Demonstrative NPs are more akin to pronouns in that they usually are anaphoric in the above-mentioned sense. However, in LTAG, demonstrative determiners adjoin to an NP and their associated feature structure contains a +definite feature ([Gro99, 159]). As stated above, definite descriptions are not discussed in [JKR03, Kal02, KJ99]; as we will see below, they do discuss the semantic representation of quantifiers, modeling them with a “scope part” and a“predicate-argument part”. As definite determiners are not quantifiers ([Gro99, 161]), we will extrapolate their semantic representation. We will also extend [JKR03, Kal02, KJ99]’s representation of PPs as NP-adjuncts to PPs as S-adjuncts. The elementary trees for (4.33) are shown in Figure 4.55; to build the discourse structure, the elementary tree representing the empty connective (anchored by ) is employed. Figure 4.56 shows H the semantic representations of these elementary trees. way D D ¢3% a `32  9'(!54BT% ¡ Ì NP D `3 &%  Q 5('RbRf¡ × Ö Ö × Ì S •• • – –– S   this Ë NP DET ËS H Ì S •• • – –– S P D D S PP D NP S NP D in Figure 4.55: DLTAG Elementary Trees for Example (4.33) q : way’ ————arg: – w w w w w ¢ 3% a ` 32 5T§54 % ¡ ç ` 3 &%    Q 9§T9b59$¡ l : interview’(c, e) ———————– arg: – l : consider’(c, o) ——————— arg: – ™ )  T% ) 204 , %  W¤ (¥¢ ¡ 4"42 , , X e) ) t w w w X e) Û t ™ g H g ™ Û g Figure 4.56: Semantic Representation of , ¢3% a `32  9'(!54BT% ¡ % this’(p (z)) ————— arg: p and `3 &%  Q 9('9URf¡ ) %  4"42 l : ’(s , s ) ————————arg: s , s ,(3) W¤ (¥¢ ¡ l : in’(x , s ) —————————arg: x ,(1.2) , s
    • w š contains ; as in [Kal02], , ,  T% ) ) w @ , which as we have already stated is an argument variable whose values are discourse-unit labels. %  4"42 contains p , which as in [Kal02] is an argument variable W¤ §6¢ ¡ is labeled with the unary predicate value q . We represent w ) whose values are unary predicates. w  T% denotations. also contains are argument variables whose values are NP- s › š As shown, the interpretation of the empty connective, e.g. the continuation of the discourse, simply as ’; the H rest of its representation is identical to that for and. Figure 4.57 shows the derived tree, derivation tree, and -derivation graph for this example, F along with the semantic representation that would result from a bottom up traversal of the derivation , substituting the result into NP P DET NP this way (3) (0)   '%42 ) W¤ (¥¢ ¡ in D (0) (3) (1.2) (0) (0)   4%"42 ) W¤ (¥¢ ¡ PP (0)  T% ) `3 &%  Q 9('9URf¡ × ` % Q 53(&'RbRf¡ Ö Ö × ú ú û û S ¢3% a ` 32  9'(!54BT% ¡ ‘ ‘ ‘ ’ ’ H S‘ ’ ’ S       S  û û    D .  T% ) `3 &%  Q 5('9URf¡ `3 &%  Q 9('9URf¡ D X e) ¢ 3% a ` 32 5T§54 % ¡ ) %  4"42 D ¢3% a `32  9'(!54BT% ¡ , and adjoining the result to , adjoining the result to X Y) ¢3% a `32  9'(!54BT% ¡ X ) W¤ (¥¢ ¡ , substituting the result into ) to  T% tree, e.g. from adjoining (1.2) (0) l : interview’(c, e), l : consider’(c, o), this’(way’(z)), l : in’(z, l ), l : ’(l , l ) ——————————————————————————————— arg: – w ™ H g ç ç w ™ Figure 4.57: DLTAG Derived and Derivation Trees, -Derivation Graph and Semantics for (4.33) F This leads to the final semantic representation shown in (4.34). Note that the anaphoricity of this way has not been represented or resolved by the compositional semantics. Anaphora/Demonstrative NP resolution should determine that it refers to l : interview’(c, e). ™ consider’(c, o) ’(interview’(c, e), in’(this’(way’(z)), consider’(c, o))) H t t (4.34) interview’(c, e) Clearly, bottom-up traversal of the derivation tree yields the intended semantic interpretation. If instead we use a top-down traversal of the -derivation graph, we could still achieve this interpreta- F 205
    • ) and  T% %  4"42 ) tion. In this case, the -edge between is crucial; as in [Kal02], the representation of F the determiner this contains the argument which is identified with the substitution argument at po- s sition (1.2) in the representation of the preposition via the -edge between them. However, as noted F above, these -edges, if not constrained, allow numerous other interpretations. Moreover, Constraint because only one of  T%  T% ) and ) X e) F (1) does not require us to use the -edge between ’s arguments F is a discourse unit variable. The interpretation produced by considering a top-down traversal of primary edges would be that of two separate discourse relations: the relation conveyed by would H be between the first and second sentence - the fact that in this way adjoins to the second sentence would not be reflected in the relation. The relation conveyed by in this way between the second H sentence and the prior discourse would be reflected in a separate formula. Considering the -edge F could produce still additional interpretations; however, Constraint (2) would prevent additional interpretations arising from this -edge. Clearly, bottom-up traversal provides a better approach to F deriving the intended semantic interpretation from at least these DLTAG structures. As shown in Chapter 3, however, explicit AO reference, (e.g. this way), is not the only way an adverbial can function as a discourse connective. Both PP and ADVP adverbials may also contain hidden definite AO arguments that can sometimes be made explicit by a PP modifier, e.g. as a result (of that), consequently, for an example (of that). Consider for example the discourse in (4.35). (4.35) Mike found no new clients. Consequently, he lost his job. The elementary DLTAG trees for (4.35) are shown in Figure 4.58. As shown, consequently takes only a single structural argument; the discourse unit it modifies. Again, DLTAG uses the empty connective to build the structural connection between the two sentences. & % T0 ¡ D D 3 ¦ 918f¡ D ËS H Ì S •• • – –– S Ì S D S •• –– S• – ADVP D S D D ADV consequently Figure 4.58: DLTAG Elementary Trees for Example (4.35) 206
    • Figure 4.59 shows the semantic representations of the elementary trees in Figure 4.58. As contains ठ% w@ ) dW45g55URQ ¦2 3 Sc3  shown, , which represents the interpretation of adjunction argument vari- able, and it also contains [[s ]] , which represents a hidden AO argument that must be resolved anaphorically. Here we experiment with the explicit use of an assignment function to represent this @ anaphoricity; because the first sentence yields the intended value for , we use the variable as C generalized variable for AO interpretations. X Y) & % T0 ¡ l : find’(m, c) ————– arg: – X Y) , & % T0 ¡ , and 3 ¦ 91©$¡ ™ g ™ dW45g55URQ ¦2 3 Sc3  H g ठ% w ™ Û g ) t ) 8W4B5g9bRQ ¦2 3 Sc3  w w Figure 4.59: Semantic Representation of l : lose’(m, j) —————arg: – ç l : ’(s , s ) ——————— arg: s , s ,(3) 3 ¦ 91©$¡ l : consequently’([[s ]] , s ) ——————————arg: s Figure 4.60 shows the derived tree, derivation tree, and -derivation graph, along with the se- F mantic representation that would result from a bottom up traversal of the derivation tree, e.g. from ADVP (0) D 3 ¦ 918f¡ (3) 8W4B5g55b59Q ¦2 3 Sc3  ) ADV . (0) (0) ) ) & % UT0 ¡ S • • – – H S• – S “ “ “ “ ûû ” ” ” S “ û” ” X Y) & % UT0 ¡ D (3) dW49g9bRQ ¦2 3 Sc3  8W4B5g55b59Q ¦2 3 Sc3  D 3 ¦ 518f¡ 3 ¦ 518f¡ D , and adjoining the result to X e) & % T0 ¡ , substituting the result into X ) to & % T0 ¡ adjoining (0) 3 ¦ 91©$¡ consequently ç ठ% l : find’(m, c), l : lose’(m, j), l : consequently’([[s ]] , l ), l : ’(l , l ) ————————————————————————————————arg: – w ™ H g ç w ™ Figure 4.60: DLTAG Derived and Derivation Trees, -Derivation Graph, and Semantics for (4.35) F This leads to the final semantic representation shown in (4.36). Again, anaphora resolution should determine that the anaphor resolves to l : find’(m, c). And also as in the prior example, the ™ -derivation graph would produce this, along with numerous other, semantic interpretations. H Åt lose’(m, j) à (¤ % t (4.36) find’(m, c) ’(find’(m, c), consequently’([[s ]] , lose’(m, j))) 207 F
    • As noted above and in Chapter 3, in PP adverbials such as as a result (of that), the internal argument contains a hidden or overt anaphoric AO argument. When this argument is overt, we take the same tack as in [Kal02], where similar binary NP predicates (e.g. winner of NP) are modeled using NP elementary trees with a substituted PP argument. Consider for example the discourse in (4.37). The elementary DLTAG trees for (4.37) are shown in Figure 4.61. (4.37) Mike found no new clients. As a result of that, he lost his job. NP NP a Ë NP DET NP PP that result P of D D 3 ¦ 91©$¡ ËS H Ì S •• • – –– S & %0 U "f¡ D D S D S as NP Ë P D Ì Ú Ì S ÙÙ Ú •• • – –– S PP NP   × Ö Ö × NP ú ú û û D Figure 4.61: DLTAG Elementary Trees for Example (4.37) As shown, there is no extra elementary tree for the of preposition. Rather, it is treated as semantically void and is part of the elementary tree for result that selects for the PP containing its internal argument. Furthermore, this tree contains a substitution site for this internal argument. Figure 4.62 shows the semantic representations of the elementary trees in Figure 4.61. q : y[result-of’(y, x )] —————————– arg: x ,(2.2) ß w g Û ç , , X ) ™ , , & % UT0 ¡ w X Y) & % T0 ¡ 2 ¤ 2 0  h ¦ S3 ` (4$¡ 1Yid2g91$¡ ¤ ) , l : lose’(m, j) —————arg: – and 3 ¦ 518f¡ w  V¤ ) w g t ™ t w w w Û H g ™ Û g w b@ contains 3 ¦ 5!©$¡ )  V¤ ) As shown, t w l : find’(m, c) ————– arg: – Figure 4.62: Semantic Representation of that’([[x ]] ) —————arg: – g  V¤ l : ’(s , s ) ——————— arg: s , s ,(3) à §¤ % ) ¤ 0 h ¦ S3` 1Yid2g91$¡ a’(p (z)) ———— arg: p 2 ¤ 2 §4£¡ l : as’(x , s ) ————————– arg: x ,(1.2) , s , which as usual represents the interpretation of the adjunction argu208
    • ment variable, and it also contains x , which represents the substituted NP. We also follow [Kal02]’s w representation of the quantifier , except that we condense the scope and predicate parts into a sin- j gle formula; we will discuss scope issues below. Note that supplies the individual variable for j the binary predicate result, and takes an argument variable p to this predicate. Following [Kal02]’s w representation of the binary predicate winner-of, result-of is represented as a binary predicate value; it takes a substituted argument and the entity it denotes is treated as a bound variable which will be š entity (z). We treat the pronoun that as denoting an j instantiated with the individual variable supplied by whose value must be fixed by an assignment function. Figure 4.63 shows the derived and derivation trees, -derivation graph and semantics that result into the result24 , substituting the result into úú ú û ûû Í Í Î Í Î Î 3 ¦ 918f¡ Í Í Î Í Î Î & % T0 ¡ S         H S    S ’ ð ð ð ð ð ñ ð S ð ’ñ ’ñ ñ’ ñ ñ ñ D X ) & % T0 ¡ DET NP (1.2) PP P (0) (2.2) (3) (0) (1.2) (0) (2.2) NP of result (0) that ठ% a NP   as NP (3) 2(¤4$¡ 2 ¤ ) •• • – –– 0 h ¦ S3` 1Yid2g9!!$¡ P . (0) 2(¤4$¡ 2 ¤ ) •• • – –– 0 h ¦ S3` 1Yid2g9!!$¡ PP  7¤ ) 3 ¦ 518f¡ D (0)  7¤ ) 3 ¦ 5!©$¡ D , and finally adjoining the result to X ) & % T0 ¡ 3 ¦ 518f¡ ) ¤ D X Y) , substituting the result into , & % T0 ¡ 0 h ¦ S3` 1Yid2g91$¡ adjoining the result to 2 ¤ 2 (g4£¡ , substituting  7¤ to ) F from adjoining l : find’(m, c), l : lose’(m, j), l : as’(a’( y[result-of’(y, that’([[x ]] ))](z)), l ), l : (l , l ) ————————————————————————————————arg: – w ™ H g ç ß w ç ™ Figure 4.63: DLTAG Derived and Derivation Trees, -Derivation Graph and Semantics for (4.37) F This leads to the final semantics shown in (4.38). Again, anaphora resolution should determine that the anaphoric reference of the explicit internal argument of result resolves to l : find’(m, c). ™ 24 The traveral begins at frontier nodes; left or right ordering does not matter. 209
    • Note that if we chose instead to employ a top-down traversal order using the -derivation graph, the  7¤ ) and F ¤ ) -edge between would be crucial, as the quantifier introduces the argument needed by F the preposition. Also as above, -edges could produce other interpretations. F H Åt lose’(m, j) ठ% (4.38) find’(m, c) ’(find’(m, c), as’(a’(result-of’(z, that’[[x ]] ), lose’(m, j))) t As shown in Figure 4.39 and (4.39), if the internal argument of result is hidden, we can represent result syntactically using the atomic NP tree, and represent its hidden argument only in the semantics. Figure 4.65 shows the derived and derivation trees, -derivation graph and semantics & % T0 ¡ d2g91$¡ ¦ S3` , and adjoining the result to  V¤ , substituting the result into , adjoining the result to 3 ¦ 91©$¡ F XY) ¤ ) substituting the result into to ) that result from adjoining . This leads to the semantics in (4.40). (4.39) Mike found no new clients. As a result, he lost his job. ठ% q : y[result’(y, [[x ]] )] —————————– arg: – ß w NP result PP P D NP result (0) (1.2) (0) à (¤ % a Ï Ï DET (3) ¤ ) 82g91$¡ ¦ S3` as NP (0)  7¤ ) 3 ¦ 91©$¡ Ð Ð 3 ¦ 918f¡ •• • – –– & % UT0 ¡ S ‘ ‘ ‘ ’ ’ H S‘ ’ ’ S ““ “ “ “ – –” ” ”” S “ –” ” D in (4.39) X Y) & % UT0 ¡ D d2BR!1£¡ ¦ S3` Figure 4.64: DLTAG Elementary Tree and Semantic Representation for D l : find’(m, c), l : lose’(m, j), l : as’(a’( y[result’(y, [[x ]] )](z)), l ), l : (l , l ) ————————————————————————————————arg: – w ™ H g ç ß ç w ™ Figure 4.65: DLTAG Derived and Derivation Trees, -Derivation Graph and Semantics for (4.39) F H Åt lose’(m, j) ठ% t (4.40) find’(m, c) ’(find’(m, c), as’(a’(result’(z, [[x ]] )), lose’(m, j))) 210 ,
    • Now, we have shown how bottom-up traversal of the DLTAG derivation tree yields a single semantic interpretation for the above examples, as compared to the -derivation graph, which would F yield multiple additional interpretations of each example. The next issue to address concern examples where we may want some semantic ambiguity, to see if it can be achieved via flexible composition in the DLTAG derivation tree. Recall that discourse adverbials do not compose with their left argument, and, because the sentence-level parse is retained at the discourse level, the interpretation of the right argument of discourse adverbials is always the interpretation of the S they modify. Recall further that, in the two-sentence discourses containing a structural connective and a discourse adverbial that we have so far considered, bottom-up traversal of the derivation tree always yields the structural connective’s second argument as the interpretation of the second sentence modified by the discourse adverbial. Do we ever want, for example, an interpretation where the structural connective’s second argument is instead the interpretation of the unmodified second sentence? [WJSK03] identify four separate cases concerning the interaction of the relation supplied by a discourse adverbial with the relation supplied by a structural connective. Case 1 represents discourses in which the two connectives are interpreted as each supplying an independent relation to the discourse. The example they use was presented in Section 4.3 as an example of how discourse adverbials allow crossing dependencies, and is repeated below in (4.16). As already noted, the discourse adverbial then relates the ordering situation described in b. to the discovering situation described in d., which itself precedes (temporally) the canceling situation described in c. The structural connective because relates the canceling situation described in c. to the discovering situation described in d. a. b. So he ordered three cases of the ’97. c. But he had to cancel the order d. (4.16) John loves Barolo. because then he discovered he was broke. Contrast Case 1 with Case 2, which represents discourses where the relation supplied by the discourse adverbial is interpreted as the right semantic argument of the structural connective. The example of this case used in [WJSK03] is shown in (4.41). In this case, the interpretation can be 211
    • paraphrased as If the light is red, stop, because if the light is red and you do something other than stop, you’ll get a ticket: [WJSK03] argue that the left argument of otherwise is the inferred situation where the light is red and you do something other than stop25 . a. (4.41) b. c. If the light is red, stop because otherwise you’ll get a ticket. If in Case 1 the right argument of because is the unmodified clause in (4.16 d), and in Case 2 the right argument of because is the modified clause in (4.41 c), then the DLTAG derivation tree can only account for these different interpretations by varying the traversal order, such that Case 2 is achieved via bottom-up traversal, and Case 1 is achieved via top-down traversal. Allowing both traversals introduces semantic ambiguity into our analyses of all the examples addressed in this section that contain a structural connective and a discourse adverbial, such that the interpretation produced by a top-down traversal yields two separate relations conveyed, and a bottom-up traversal yields the discourse adverbial’s relation embedded in the structural connective’s relation. Note however that such ambiguity is still less than that allowed by -edges. As illustration, the derivation F tree and -derivation graph for (4.41) are shown in Figure 4.66. F ¿ 49 ¡ &3` 1£¡ 394%6!94  ¢ `3 2 2 ) •• • – –– 0% '$¡ 2 93§p ¡ ‘‘ ‘ ‘ ‘ ’ ’ ’ ’’ 3 S¤Q3I 9(RP$¡ (1) (3) ¿ 4R ¡ &3` 51$¡ 354%e!54  ¢ `3 2 2 ) •• • – –– 0% 4£¡ 2 53§p ¡ ‘‘ ‘ ‘ ‘ ’ ’ ’ ’’ 3 S¤Q3I 5B(95Pf¡ (1) (3) (1) (0) (1) (3) (3) (0) Figure 4.66: DLTAG Derivation Tree and -Derivation Graph for Example (4.41) F Cases 3 and 4 correspond in some sense to the opposite of Case 2. Case 3 represents discourses where the relation supplied by a discourse adverbial to the second sentence is parasitic on the relation that the structural connective supplies between the first and second sentence, and Case 4 concerns cases where the relation supplied by the structural connective is incorporated into the semantics of the discourse adverbial as a defeasible rule. We focus below on Case 3; Case 4 would be handled similarly. An example of Case 3 ([WJSK03]) is shown in (4.42). 25 See [WJSK03, KKW01b, KKW01a] for detailed discussion of the lexical semantics of otherwise. 212
    • (4.42) John just broke his arm. So, for example, he can’t cycle to work now. The interpretation of this example is that John not being able to cycle to work is one example of the result of John breaking his arm. In other words, the left argument of for example is dependent on the relation supplied by the structural connective so between the first and second sentences. [WJSK03] argue that the interpretation of for example involves first abstracting the meaning of the left argument with respect to the the meaning of the unit it modifies, and then making an assertion represents the interpretation of he can’t cycle to represents the interpretation of John just broke his arm, and result( , ) represents the ª   work now, ª with respect to this abstraction. In their terms, if interpretation of the relation so supplies to these arguments, then the interpretation of for example is as in (4.43), where exemplify represents the relation for example supplies. (4.43) exemplify ( , X. result(X, ))  ß ª Interestingly, [WJSK03] argue that the interpretation of for example thus resembles the interpretation of a quantifier, in that the scope of its interpretation is wider than is explained by its syntactic position. In other words, our semantic analysis should take into account the fact that in order to interpret for example, we appear to always abstract the interpretation of preceding predicate. Note that althugh [WJSK03] don’t give for example an analysis that follows from its internal predicate argument structure and semantics (i.e., hidden argument), but rather treat it as an unanalysed lexeme, their approach is compatible with an analysis that takes its internal predicate argument structure and semantics into account. In order to understand [WJSK03]’s treatment of for example as a quantifier, we must backtrack and examine [JKR03, Kal02, KJ99]’s account of English quantifiers, which we have so far overlooked. Above we discussed how prior analyses of French quantifiers as NP-adjuncts led [Kal02] to propose deriving semantic interpretations from the -derivation graph, rather than the F derivation tree. In English, however, quantifiers have been analysed as both NP adjuncts (auxiliary trees) and as NPs into which generic nouns substitute (initial trees)26 . [JKR03, Kal02, KJ99] address the treatment of English quantifiers as initial trees, and analyse them as having a “scope” and “predicate-argument” part, which they represent using multi-component TAGs (MC-TAGs). 26 see [Kal02] for references. 213
    • Briefly, each quantifier is associated with two elementary trees: one auxiliary tree consisting of a single node representing the scope part of the semantics of the quantifier, and one initial tree representing the predicate argument part. These trees are shown respectively as the first two trees in Figure 4.67, along with their semantic representations. The scope part (shown first), introduces a proposition (l ) containing the quantifier, holes for its restrictive (h ) and nuclear (h ) scope, and its g ™ h ). The predicate ™  w g variable (s ), which is asserted to be in the nuclear scope of the quantifier (s w argument part introduces a proposition (l ) containing a predicate variable (p ); this proposition is in W`3a !9§53 ) g , p obtains the value q . After  ¨`¤I "1(Pf¡ w W`3a !9§53 ¡ w p& (f¡ w g u w  ¨`¤I "1(Pf¡ l , which leads to the semantics: every’(x, dog’(x), bark’(x)). u g l ,h l , ™ , x obtains the value . The only possible disambiguation of holes is then: h h , s obtains substitutes into W`3a !9§53 ¡ ™  substitutes into adjoins to the root of w ™ the value l . After h ). After w the restrictive scope of the quantifier (l š w u ™ w Ë × Ö Ö × S NP Q N N Ë ÷÷ øø Ì S NP dog VP V every barks q : dog’ ———– arg: – l : bark’(x ) l h —————— arg: x , (1) , ) w , t w w  w p& W`3a (f¡ !5§53 ¡ 5§53 W`3a ) w g w Û , w W`3a !9§53 t w Û  ™ ™ ) ™ g ™  w g w Figure 4.67: Elementary LTAG Trees and Semantic Representations of ¨`¤ "1(PI W`3a 5§53 ¡ p& 5§$¡ l : p (x) l h ————— arg: p , (2)  ¨`¤I "1(Pf¡ l : every’(x, h , h ) s h ——————– arg: s Now, as we saw above, when [Kal02] analyses the semantics of French quantifers, she argues that MC-TAGs (e.g. associating quantifiers with two trees to account for their scope) shouldn’t be used, because French quantifiers are usually treated as NP-adjuncts, and in order for their scope part to adjoin to S, non tree-local MC-TAGs would be required. Because an unrestricted use of nontree-local MC-TAGs has been shown to be much more powerful than TAG, she proposes instead the -derivation graph, which is close in power to TAG. On the other hand, in [JKR03] an alternative F approach is taken, using flexible composition in combination with a restricted use of non-tree-local 214
    • MC-TAGs (e.g. only adjunction of the scope part of quantifiers is allowed to be non-tree-local), which does not effect the generative capacity of the grammar. The syntactic and semantic analysis of NP quantifers is not at issue here; what is at issue is the syntactic and semantic analysis of discourse adverbials such as for example. We have illustrated the use of MC-TAGs for English quantifers because [WJSK03] extend this analysis to for example. As noted above, [WJSK03] argue that for example behaves like a quantifer, in that its interpretation must take scope over a preceding predicate, which it abstracts. They suggest that for example can be associated with the MC-TAG in Figure 4.68, consisting of two auxiliary trees. They argue that the second auxiliary tree shown in the figure (the predicate argument part) adjoins to the root of the S it modifies, while the first auxiliary tree shown in the figure (the scope part) adjoins to the root of the higher discourse unit. For example, this tree would adjoin to the root of the so tree in the derivation of example (4.42). D Ì S for example D Ì S •• • – –– S D Figure 4.68: Elementary DLTAG Trees for Example for example In fact, it seems natural consider how [WJSK03]’s analysis of scope effects in for example extends to the interpretation of all discourse adverbials whose internal arguments are modified by quantifiers or are generic, including an example, a result, every case in as an example, as a result, in every case, etc. For example, in (4.44), the scope effects of for an example can be analysed identically to for example in (4.42) above. (4.44) John just broke his arm. So, as an example, he can’t cycle to work now. We leave the details of this extension for future work. Note briefly however that if a top-down traversal of the derivation tree is used, then [WJSK03]’s analysis of for example would require nontree-local MC-TAGs, since the second auxiliary tree shown in the above figure adjoins to the root of one elementary tree (the S it modifies), and the first auxiliary tree adjoins to the root of a different elementary tree (the higher structural connective). On the other hand, if a bottom-up traversal of 215
    • the derivation tree is used, than tree-local MC-TAGs result. As illustration, the derivation tree for signifies the scope part of  5 ¡ ) 83RVP$¡ ¦QWQ result is a derived tree with a scope part, both of which compose into ) as composing into w 83¦ ¿ ¡ "150 ¤ 3` g ©3¦ ¿ ¡ !1"0 ¤ 3` the MC-TAG. In a bottom-up traversal, we can view ) signifies the predicate-argument part of the MC-TAG, and w ©3¦ ¿ ¡ !1"0 ¤ 3` [WJSK03]’s analysis of for example is shown as the first tree in Figure 4.69, where . The . The second derivation tree shown in Figure 4.69 represents an analysis of for example that in- ) 83¦ ¿ ¡ "$¡ ¤ 3 argument ( ©3¦ ¿ ¡ !3 ¤ corporates its internal predicate argument structure. In this case, the scope ( ) and predicate ) parts of an MC-TAG are associated with the generic NP example. In this case, we cannot simultaneously employ both tree-local MC-TAGs and bottom-up traversal. In a bottom- ) , the scope part, 83¦ ¿ ¡ "3 ¤ )  5 ¡ 83¦ ¿ ¡ "f¡ ¤ 3 the result composes into compose; the resulting derived tree must compose into ` 50 and 83RVP$¡ ¦QWQ up traversal, . When , adjoins at the root; this is a non-tree-local use of MC-TAGs akin to the analysis of quantifiers in [JKR03, 10], which does not affect the generative capacity of the grammar. On the other hand, flexible composition allows us to start at any node. 83RVP$¡ ¦QWQ  5 ¡ adjoins at the root; this is a tree-local use of MC-TAGs. w 83¦ ¿ ¡ "1"0 ¤ 3` ) g ©3¦ ¿ ¡ ¤!1"0 3` 83RVPf¡ ¦QWQ 3 ¨`I !!if¡ ð ð ð ð) ð ñ ð ð 5ð # #¡ ñ ñ ñ ñ ñ ñ ñ  (1) (3) . When the result (0) ` "0 ) ©3¦ ¿ ¡ ¤!3 8397Pf¡ ¦QWQ 3 ¨`I !1i$¡   )     9 $$ ¡       ) , the scope part, , then the result can compose into 83¦ ¿ ¡ "3 ¤ ) ` "0 composes into into 83¦ ¿ ¡ "f¡ ¤ 3 If we first compose (1) (3) (0) (0) (0) (2) 83¦ ¿ ¡ "$¡ ¤ 3 Figure 4.69: Derivation Trees for PP Discourse Adverbials with Quantified Internal Arguments We end this section with the comment that the analyses presented above extend to larger discourses, such as in (4.45), but also introduce additional considerations. We illustrate this with two possible derivation trees that can be produced for (4.45), shown in Figure 4.70, along with a possible derived tree. (4.45) Mary found a job. Then Mike got a raise. Consequently they had enough money to buy a house. 216
    • 23 5§p ¡ In the first derivation tree in Figure 4.70, we show multiple adjunction at . In [Kal02] multiple adjunction at a single node is avoided when possible, as combining MC-TAGs with the unrestricted use of multiple adjunction at a single node goes beyond the power of LTAG. It may be that in DLTAG too, multiple adjunction at a single node should be avoided, depending on the scoping possibilities of the relations conveyed by discourse connectives in discourses containing multiple discourse connectives; this is an issue for future study. In the second derivation tree in Figand adjunction of X Y) 23 5§p ¡ to to ) 27 3  9g42 3 542 ) ure 4.70, we have shown adjunction of . Note however that one interpretation produced by both these derivation trees is that Mike and Mary having enough money to buy a house is a consequence of Mike getting a raise. However, another interpretation of this discourse could be that Mike and Mary having enough money to buy a house is a consequence of both Mike getting a raise and Mary finding a job. This interpretation highlights an additional , adjoins to the root . The potential interaction between the resolution of the left argument of consequently and the adjunction site of the second X ) X Y) of the first X ) possible derivation tree for this discourse, namely one in which the second is not yet fully understood. Although as discussed above, [WJSK03] present three cases describing how the relations supplied by discourse adverbials and structural connectives can interact, the more general question of how the resolution of the anaphoric arguments of discourse connectives can be restricted by the structural description of the discourse has not yet been addressed. In fact, it may well be that this interaction is best addressed via consideration of discourse deixis research, which has already shown that the resolution of discourse deixis interacts with the structural description of the discourse (see Chapter 2). We will address similar remaining questions in Chapter 6. The discussions above indicate that we may want to allow some ambiguity in the interpretation X ) of the substitution argument of both the higher and the lower , and we may want to allow for some ambiguity in the interpretations of the left arguments of the discourse adverbials then, consequently, but possibly only in some cases, and possibly only with respect to their interaction with the interpretation of prior structural relations. 27 The discourse adverbial then can be represented structurally akin to consequently 217
    • D S S Ï Ï • • Ð – – H & % 0 UTf¡ S• – S Ð       S  ڐ Ú     H S      ’ ’    S  ’ ’  D D D D D 23 9§p ¡ then ADVP consequently D 3a¤ §§f¡ ADVP X ) & % T0 ¡ X ) & % T0 ¡ (0) (3) 23 5§p ¡ ) (0) (3) (0) (0) (0) 3a¤ §(f¡ (3) dW49g9bRQ ¦2 3 Sc3  XY) ‘‘ ‘ ’ ‘ 5§p’ ¡ ’ ’ 23 (0) (3) X e) 3 542 (0) (0) 3 942 ) 3a¤ §(g$¡ dW49B55URQ ¦2 3 Sc3  ) ) Figure 4.70: DLTAG Derived and Derivation Trees for (4.32) 4.3.4 Comparison of Approaches The great potential for ambiguity that arises in discourse is due in part to the fact that the semantic type of all discourse units is the same. Thus, while we have shown that [Kal02]’s enriched derivation structure can be used at the discourse level to yield compositional semantics on the DLTAG - F derivation graph, we’ve also shown that flexible composition in the derivation tree is likely a more parsimonious approach. However, the consideration of more complex discourses is needed before a compositional semantics for the DLTAG derivation tree is complete. All of the approaches to building a syntax-semantics interface for discourse that have been discussed in this section are exploratory, rather than conclusive. Nevertheless, the essential theory behind each is well-defined, enabling their similarities and differences to be compared. 218
    • Apart from the definitional inconsistencies discussed above, [Gar97b]’s approach to building a syntax semantic interface is a viable one that builds on feature-based approaches to aspects of compositional semantics that have already been proposed at the clause level. Gardent’s discourse grammar (DTAG) is actually quite similar in many respects to DLTAG, in that it uses trees and the operations of substitution and adjunction for combining them. In fact, it appears that all of the elementary R trees that Gardent exemplifies could be represented as shown in Figure 4.71, e.g. with one explicit adjunction and one explicit substitution site, where each node is associated with the feature structure shown in Figure 4.26. Ì × Ö Ö × R[] B[] Ë A[] Figure 4.71: Another Representation of the R Tree in Figure 4.26 Of course, this is not a “recursive” structure, because the foot and root nodes are not of the same category. Moreover, there are thus no “anaphoric” discourse relations in Gardent’s grammar, which [WJSK03] have shown to be necessary to limit the computational power required by discourse structure to that of a tree-based grammar, as discussed above. Furthermore, [Gar97b]’s approach (DTAG) relies on the definition of discourse relations as feature structures. The major difference between DTAG and DLTAG, and indeed between DLTAG and all other approaches to discourse structure and interpretation discussed in Chapter 2, is that DLTAG views cue phrases themselves as the anchors of the elementary “relation” trees involved in the construction and interpretation of the discourse model. These anchors have a predicate argument structure and a meaning which conveys a relation between their arguments. While it is likely that some aspects of discourse relations are best modeled as features, DLTAG argues that basic predicate argument relations are also involved. Because at its most basic level, discourse connectives are predicates in DLTAG, so in DLTAG we would like to make use of the structures that retain predicate-argument information, when building compositional semantics. Although DLTAG has already argued that the structures needed at the discourse level are much simpler than those needed at the clause-level [WJ98], our exploratory extension of [Kal02]’s approach to 219
    • DLTAG already appears to provide more possibilities than are needed at the discourse level; as we saw above, for example, many -edges may be vacuous. The [JKR03] approach, based on traversal F of the derivation tree, appears to be more viable for DLTAG. Features recorded in the derived tree could also be employed in DLTAG, however, perhaps akin to the way they are employed at the clause level in FTAG, e.g. as indicating constraints on properties of arguments and predicates that must be present in order for their trees to combine with mode t Û each other. For example, in FTAG the presence of a feature (present in elementary verb trees) is required in the argument of the think tree, indicating that this argument must be a clause, not a subclausal constituent. Moreover, while predicate argument aspects of compositional semantics can be defined with respect to the derivation tree (or -derivation graph), it might be necessary to F use a combination of the derivation tree and the derived tree for anaphor resolution, if the derived tree defines some notion of locality or distance for anaphor resolution (e.g. the right frontier) that is not as obvious with the derivation tree. A wholly feature-based approach, however, ignores the predicate argument structure of connectives, and moreover might require an extremely large set of features to represent their idiosyncratic meanings. In contrast, the association of lexical items with tree anchors maintains consistency with the clause level and the lexicon. However, [Kno96] has already shown that the relations imparted by a wide variety of cue phrases can be viewed in terms of a limited set of features (although the idiosyncratic meaning of these cues is still lost). DLTAG semantics could make use of these features when building compositional semantics based on the derived tree. Knott’s approach also lends itself well to the possibility of lexicalizing inference and representing it, once computed, both structurally and semantically as feature structures within or in addition to the feature structures of the (or other structural) connective28 . H Other differences between the two approaches are less significant. For example, Gardent motivates her approach by the desirability of the incremental construction of discourse interpretation, but as she notes, the ( -)substitution operation already permits some “lookahead” with respect to E the building of intermediate trees; fully incremental tree construction is not possible in LTAG, nor is it necessarily desirable. In addition, both approaches preserve monotonicity in the semantics. 28 This suggestion was originally made by Aravind Joshi in the DLTAG meetings at the University of Pennsylvania. 220
    • The derivation tree provides is one way of preserving the monotonicity of compositional semantics even while allowing nonmonotonicity in the syntactic structure, and the use of top and bottom feature structures is another. Thus, the two approaches are largely complementary, and a complete syntax-semantics interface for DLTAG will likely combine aspects of both approaches. 4.3.5 Summary In this section we have discussed how the DLTAG grammar builds discourse directly on top of the LTAG clause grammar. We then discussed a syntax-semantic interface that has been proposed for a discourse grammar (DTAG) similar to DLTAG, and we also discussed extensions to DLTAG of LTAG interfaces presented in Section 4.2. We then compared the approaches, and concluded that aspects of both will likely play a role in a complete syntax-semantic interface for DLTAG. 4.4 DLTAG Annotation Project Because in both the discourse level and clause level parse, only one of the arguments of discourse adverbials comes compositionally, the other must be retrieved from the discourse. DLTAG views this as a problem of anaphora resolution. As with other anaphora, developing algorithms capable of resolving them in way that reflects their actual distribution in discourse requires developing an annotated corpus. In this section, we present an overview of the DLTAG annotation project and discuss two preliminary studies that have already been performed in anticipation of the larger project. 4.4.1 Overview of Project The main objective of the DLTAG annotation project is to build a corpus with discourse annotation. While not a complete representation of discourse structure, this project addresses a rich intermediate level between high level discourse structure and clause structure that can be reliably annotated, namely, the syntax and semantics associated with discourse connectives. We use the Penn Treebank annotated corpus, which contains naturally occurring data from a variety of sources, has already been annotated for clause structure and part-of-speech, and is currently being annotated for 221
    • predicate-argument structure. Our annotation schema will be designed to build an additional layer of discourse annotation into the Penn Treebank corpus, with links to this clause level information.   Discourse annotation will occur in two stages. First, the DLTAG parser [FMP 01] is used to parse the discourse, i.e. the structural arguments of each discourse connective. This parse will incorporate any clause level annotation that has already bracketed the S-internal arguments of structural connectives. Second, human annotators correct any errors in the parse, and add annotation tags for the anaphoric arguments of the discourse adverbials. This two-stage strategy has already proved successful in the clause-level annotation of the Penn Treebank corpus with respect to minimizing human effort. Once complete, this annotation can be used, along with the syntactic and predicate argument annotation already in Treebank, to develop anaphora resolution algorithms for adverbial discourse connectives. It can also be used to train a statistical version of the DLTAG parser to select the most likely parse from among the many possible structural connections, and it can be used for further research and development of NLP applications. In general, preliminary development of the DLTAG annotation project involves first determining an initial set of discourse connectives to be annotated. While it is hoped that eventually all lexical items functioning as links between clausal units will be annotated, initially the project will likely focus on high-frequency connectives. A reliable annotation schema and annotation guidelines will also be developed, consisting of a set of annotation tags and procedures for their use. Moreover, a “semantic frame” will be built for each connective, detailing the semantic properties of each connective and its arguments. In the remainder of this section we discuss in more detail the kinds of preliminary studies addressing these issues that will be performed for each connective to be annotated in the DLTAG annotation project. 4.4.2 Preliminary Study 1 There are at least two types of preliminary studies involved in the annotation of discourse connec-   tives and their arguments. The first type of study has recently been illustrated in [CFM 02]. There are three goals for this type of study. First, one simply wants to investigate the syntactic proper-   ties of discourse connectives and their arguments. In [CFM 02], nine discourse connectives were 222
    • selected for study, shown in Table 4.1. As shown, three of the selected connectives convey a resultative relation, three convey an additive relation, and three convey a concessive relation. Three different annotators each annotated seventy-five tokens of one connective from each set.   Table 4.1: Nine Connectives Studied in [CFM 02] Resultative as a result so therefore Additive also in addition moreover Concessive nevertheless whereas yet boundaries of the left and right argument of each connective, using the XML tags ARG ... /ARG t Û t CONN ... /CONN Û Û t Û to annotate the left argument, and to annotate the right argument that also contained the connective. The remaining annotation tags, illustrated in Table 4.2, were used to capture features that would be automatically-derivable from a parsed corpus. As shown in the first column, each annotator annotated the syntactic type of the left argument with a TYPE tag As shown in the second column, each annotator also annotated the presence of other discourse connectives and punctuation that co-occur with the connective being annotated. As shown in the third column, each annotator also annotated the position of the discourse connective in the containing sentence. An example from the corpus study is shown in (4.46). As shown, the discourse connective being annotated is as a result. Its left argument is the prior sentence, there are no other discourse connectives within its left or right argument except the empty connective (signaled by the period), and it occurs sentence-initially. (4.46) ARG TYPE=MAIN Your July 26 editorial regarding the position of Attorney General t Û Robert F. Kennedy on prospective tax relief for DuPont stockholders is based on an erroneous state- Û /ARG . CONN COMB=PERIOD POS=INITIAL t t Û ment of fact As a result, your criticism of Attorney General Robert F. Kennedy and the Department of Justice was inaccurate , unwarranted /CONN . t Û and unfair It was found that feature percentages varied across the discourse connectives. For example, so always occurred sentence-initially, nevertheless often took a sub-clausal constituent (XP) as its left 223 t The annotation schema consisted of four annotation tags. First, each annotator annotated the
    •   Table 4.2: Annotation Tags for the Nine Connectives Studied in [CFM 02] TYPE MAIN (sentence) MULT (multiple sentences) SUB (subordinate clause) XP (sub-clausal constituent) COMB PERIOD COMMA (SEMI-)COLON AND/BUT POSITION INITIAL MEDIAL FINAL argument, and therefore often took a subordinate clause (SUB) as its left argument. Each of the annotators then annotated an additional twenty-five tokens of one discourse connective from each semantic set: as a result, in addition, and nevertheless. It was found that the initial patterns of features percentages remained stable, indicating that these connectives display patterns with respect to these features that are systematic enough to aid in automatic argument detection. The second goal in this type of study is to test inter-annotator reliability with respect to argu-   ment annotation. For [CFM 02], three additional annotators each annotated the left argument of those twenty-five additional tokens of as a result, in addition, and nevertheless. For this annotation, the TYPE tag was replaced with a slightly more general LOC tag, whose possible values are shown in Table 4.3. The LOC tag defines the sentence, consisting of a main clause and any attached subordinate or adjoined clauses, as the minimal atomic unit from which the left argument is derived. The values of the LOC tag distinguish arguments derived from the sentence containing the connective (SS), the single prior adjacent sentence (PS), any sequence of adjacent sentences (PP), or a sentence or sequence of sentences not contiguous to the clause containing the connective. This tag is more general than the TYPE tag in that it does not ask annotators to distinguish sub-clausal or subordinate clause constituents; on the other hand, it adds the information of whether the argument is contiguous with the sentence containing the connective. All other features were automatically derivable so the additional annotators did not annotate them. Table 4.4 shows the inter-annotator results. The first column indicates four-way agreement, e.g. where all annotators labeled labeled the left argument with the same LOC value. As shown, four- 224
    • Table 4.3: LOC Tag Values SS PS (same sentence) (previous sentence) PP NC (previous adjacent paragraph) (non-contiguous) way agreement is greater than 50% in all cases. The second column indicates three-way agreement. The third column indicates the case where two annotators agreed on one LOC value, and the other two annotators agreed on another LOC value; the fourth column indicates that only two annotators agreed on a LOC value. Adding the first two columns shows that annotation of the arguments of these connectives can be done reliably; majority agreement (three-way or better) is 92% for in addition, 96% for as a result. and 88% for nevertheless. Table 4.4: Inter-Annotator Agreement 3 4 16% (4) 12% (3) 36% (9) [2,2] 4 4% (1) 4% (1) 12% (3) % % 4 4 76% (19) 84% (21) 52% (13) 2 4 4% (1) 0 0 % % Connective in addition as a result nevertheless The third goal in this type of study is to see what the sources of disagreement teach us about the annotation guidelines that will be needed and the sorts of resolution algorithms that can be constructed based on syntactic patterns. One guideline was developed based in part on an initial “exact match” comparison between the left argument boundaries annotated by each of the four annotators for the three connectives mentioned above. It was found that the annotators were using different underlying assumptions when deciding on the size and syntactic form of the left argument. For example, (4.47) contains a discourse of the form cause-result-result. One annotator might annotate both the cause and the first result as the left argument of as a result, while another annotator might annotate only the first result as the left argument of as a result. (4.47) [Lee won the lottery. [So he was happy]]. As a result, his blood pressure went down. This type of disagreement can be reduced by the use of a minimal unit guideline. If annotators 225
    • are instructed to annotate the minimal unit which could serve as the left argument of a connective, then most annotators would annotate the first result only in (4.47). The relation between the first and second sentence is not lost, for so will take as its left argument the first sentence, and take the second sentence as its right argument. Similarly, in (4.48) one annotator might annotate just the adjective overworked as the left argument of as a result, while another annotator might annotate the entire preceding clause (Lee is overworked). (4.48) [John is [overworked], and as a result, tired. Whether the minimal unit guideline applies in this case depends on how “minimal unit” is defined. If it is defined as the minimal clause, then the only option for the annotator is to annotate the entire preceding clause. If it is defined as any phrasal constituent, then the annotator would annotate the adjective. (4.48) also illustrates a possible syntactic resolution heuristic for the left argument of discourse adverbials: in all but one case in this corpus study, when a coordinating conjunction linked two clauses, and the second clause contained a discourse adverbial, the discourse adverbial and the coordinating conjunction both took the same LOC value for their left argument. 4.4.3 Preliminary Study 2 The second type of preliminary study involved in the DLTAG annotation project concerns the identification of lexico-syntactic features that distinguish contextual arguments of discourse connectives from other discourse units in the context. Such a study is currently being prepared for the discourse   adverbial instead for [CFM 03]. In this study, four annotators each annotate the left argument of twenty-five different tokens of instead drawn from the Penn Treebank corpus, yielding a total of ARG ... /ARG are used. t Û t Û one hundred annotated tokens. Again, the XML tags To retrieve the left (contextual) argument, each annotator is instructed to instantiate it as an explicit nominalized AO. For example, in (4.49), the bracketed argument could be instantiated as Instead of offenders being released locally. To ensure inter-annotator reliability, each annotator is additionally annotating one set of twenty-five tokens that has been annotated by another annotator, such that each set is annotated by two annotators. 226
    • (4.49) The prison is a big employer, and people were reassured that ARG no offender would t Û /ARG . Instead, they would be released from the prison they had first been t Û be released locally sent to. Moreover, it has been noted that the contextual clause argument of instead often has a negative subject, object, or verb, or the verb may be modal, as shown in bold-face in (4.49), or, as shown bold-faced in example (4.50), the left (contextual) clause argument may be embedded beneath a non-factive verb29 . What all of these features appear to have in common is that in one way or another they allow their clause to be interpreted with respect to an alternative set of propositions. Non-factives, for example, do not presuppose the truth of their complement clause; it can be either true or false. In (4.50), because the event of them taking a month in Europe in (4.50) is not asserted to occur, instead can, and does, asserts the occurrence of an alternative event (e.g. them building a dream house). to take a month in Europe /ARG ; t Û t ARG E jF˜@ ‡ q1g£RC Û (4.50) Their broker encouraged them they moved to South Carolina, where they began building a dream house on the beach. Each annotator is thus also annotating the presence of these features for each of their twenty-five tokens. In (4.49)-(4.50), note that the contextual alternative clauses do not have these features. Each annotator is thus also annotating competing clauses in the context for these features, to ascertain whether these and additional features do in fact distinguish the left (contextual) argument of instead from other contextual clauses30 . 4.4.4 Future Work Essentially, the preliminary studies described above are the same kinds of studies already done at the clause level to determine the predicate argument structures of verbs. Future work will include doing these types of studies for all connectives in the corpus, thereby enabling further refinement of the DLTAG annotation tags and guidelines. They will also enable “semantic frames”, modeled after those used in clause-level predicate-argument tagging (see [KP02]) to be constructed for each These features were originally noted by Bonnie Webber in a DLTAG meeting. Results and further details concerning this study is reported in [CFM 03]. Ñ 29 30 227
    • discourse connective that incorporate their predicate-argument structure, meaning, any constraints on the AO interpretation of their arguments, and any lexico-syntactic features that distinguish their argument from alternatives. These semantic frames will serve as a basic semantics for each connective, to help the annotators determine the relevant semantic roles played by the context around them in the corpus discourses. 4.5 Conclusion In this chapter, we focused on the construction of LTAG and DLTAG trees, and the computation of compositional discourse semantics from these structures. Two syntax-semantic interface for LTAG that have been proposed were presented and compared. We also discussed a syntax-semantic interface that has been proposed for DTAG, a discourse grammar similar in some respects to DLTAG. Drawing on these interfaces, we discussed how Chapter 3’s discussion of the semantic mechanisms underlying the predicate-argument structure and interpretation of discourse adverbials can be incorporated into a syntax-semantic interface for DLTAG. We also discussed the resolution of the anaphoric arguments of discourse adverbials, framing this discussion in terms of a large DLTAG annotation project currently underway. 228
    • Chapter 5 Other Ways Adverbials Contribute to Discourse Coherence 5.1 Introduction We conclude our investigation of S-modifying ADVP/PP adverbials by emphasizing that we are not claiming that it is only due to their argument structure and interpretation that adverbials establish or contribute to discourse coherence. For example, the argument structure of the ADVP adverbials actually and really cannot explain why they have been treated as discourse connectives (see [KP02, Kno96]); as discussed in Chapter 3, these adverbials take only one AO argument: the interpretation of the modified clause, whose truth or fact is asserted to be actual or real. Similarly, the argument resolution of the PP adverbials in any case and in fact cannot explain why they have been treated as discourse connectives (see [KP02]); as discussed in Chapter 3, their internal indefinite or generic NP arguments denote unspecified (sets of) entities, and are not in and of themselves referential. In this chapter we explore other explanations for why such adverbials can require discourse context for their interpretation; in particular, those that involve the interaction of their semantics with other aspects of discourse coherence. In Section 5.2 we introduce prosody as a semantic mechanism of discourse coherence and discuss prior analyses of (topic) focus. In Section 5.3 we investigate what prior research has called the focus sensitivity of certain modifiers, and discuss 229
    • how focus effects in both clausal and discourse (S-modifying) adverbials contribute to discourse coherence. In Section 5.4 we introduce Gricean implicature as an additional aspect of meaning that arises from the assumption of discourse coherence and discuss how prior analyses have accounted for it and distinguished it from the related notion of presupposition. In Section 5.5 we suggest how clausal and discourse (S-modifying) adverbials can be used to convey implicatures. In Section 5.6 we point to additional mechanisms that must also be considered both alone and in relation to “discourse connectives”, in order to construct a complete model of discourse. 5.2 Focus The use of prosody to convey meaning is a very active area of current research. There are issues involved in these analyses that are beyond the scope of this thesis; in this section we illustrate two semantic analyses of focus that assume tree-based grammars akin to DLTAG; we also highlight an approach using categorial grammars that incorporates the insights of both these analyses. [Gar97a] cites other approaches to the analysis of focus, in particular, [Pul97]. 5.2.1 The Phenomena The term focus1 is used to refer to the prosodic emphasizing of parts of utterances for communicative purposes. Focus is typically expressed in spoken language by pitch movement, duration, or intensity on a syllable (see [Kri, Lad66, Ste00a]). In addition, certain specific syntactic constructions, such as the English cleft sentence shown in (5.1), make use of focus to achieve communicative effects (see [Pri86]). Here and below we use capital letters to mark the focused phrase, unless a particular analysis employs a different representation. (5.1) It was BILL that she invited for dinner. There are also languages that make use of specific syntactic positions (e.g. the preverbal focus position in Hungarian), dedicated particles (e.g. Quechua), or syntactic movement (e.g. Catalan), to achieve the communicative effects of focus ([Kri]). 1 There is another use of the term focus,which refers to discourse referents that are salient at the current point of discourse and are potential antecedents for pronouns (see [GS86]). 230
    • To illustrate the discourse effects of focus, consider the examples discussed by [Roo95a, Gar97a]. r Question Answer Congruence The questions in (5.2) can be answered by the answers in (5.3) non-contrastively (a) or contrastively (b), but not by the answers in (5.4): (5.2 a) Who did Mary invite for dinner? (5.2 b) Did Mary invite Bill or John for dinner? (5.3 a) Mary invited BILL for dinner. (5.3 b) Mary didn’t invite BILL for dinner, but JOHN. (5.4 a) Mary invited Bill for DINNER. (5.4 b) Mary didn’t invite Bill for DINNER, but JOHN. The answers in (5.3)-(5.4) are identical except for the position of focus. The position of focus thus correlates with the WH-phrase or disjoined alternatives in questions. However, (5.4 b) has an additional contrastive reading that Mary invited Bill for something other than dinner; in this case the contrastive focus on dinner does not coincide with the position of the WH-phrase or disjoined alternatives. See [Kri92] for an analysis of multiple focus constructions. Reasons and Counterfactuals Imagine that John and Mary are friends. John finds out that he will inherit a fortune if he marries within the year. He arranges to marry Mary because going through the process of finding someone else to marry is, in his opinion, too time-consuming. Under these circumstances, the set of sentences in (5.5) are easily accepted as true. (5.5 a) The reason John MARRIED Mary was to qualify for the inheritance. (5.5 b) The reason John married MARY was to avoid a time-consuming process. (5.5 c)If John hadn’t MARRIED Mary, he might not have gotten the inheritance. 231 r
    • The set of sentences (5.6) are more likely perceived to be false. (5.6 a) The reason John married MARY was to qualify for the inheritance. (5.6 b) The reason John MARRIED Mary was to avoid a time-consuming process. (5.6 c) If John hadn’t married MARY, he might not have gotten the inheritance. Again, the sentences in (5.5)-(5.6) are identical except for the position of focus. The position of focus can thus influence the truth-conditions of reasoning and counterfactual statements. r Conversational Implicature Imagine that Mary and John just received their report cards. Their mother asks Mary about their Economics grades. If Mary answers with (5.7 a), she gives the impression that John barely passed. If Mary answers with (5.7 b), she gives the impression that she did not pass. (5.7 a) Well, John PASSED. (5.7 b) Well, JOHN passed. Again, the sentences in (5.7) are identical except for the position of focus. The position of focus can thus effect the implicatures we draw from what we say. 5.2.2 Information-Structure and Theories of Structured Meanings [Cho76] observed that focused constituents pattern syntactically like quantifiers and WH-constituents with respect to crossover phenomena. For example, suppose he refers to John. Then while (5.8) w has only a coreferential reading, (5.9) has both a coreferential and a bound variable reading. (5.8) We only expected the woman he loves to betray HIM . w w (5.9) We only expected HIM to be betrayed by the woman he loves. w w The coreferential reading is distinguished as follows: We expected John to have the property: x.x is betrayed by the woman John loves. ß The bound variable reading is distinguished as follows: We expected John to have the property: x.x is betrayed by the woman x loves. 232 ß
    • [Cho76] thus proposes that focused constituents be analyzed like WH-constituents and quantifiers. In other words, they must move and they are assigned scope at the interpretation level of Logical Form (LF), such that an empty variable in the surface position of the scoping element is bound by a lambda operator in the semantic interpretation. As noted by [Roo95a], in Chomsky’s view focus has the force of an equality, expressed in terms of a definite description as“the x is John”. The LF for (5.8) is represented as shown in (5.10). expected the woman he loves to betray e )) w Ò …Õ only (HIM) ( w Ò …Õ (5.10) We ( Chomsky’s bound variable principle then accounts for the crossover effects in (5.9), by allowing two LFs, shown in (5.11 a)-(5.11 b). Chomsky’s bound variable principle is as follows: At LF, the phonological content of a pronoun may be optionally deleted if it is c-commanded by a co-indexed empty variable. expected e to be betrayed by the woman e loves)) w w Ò …Õ Ò …Õ only (HIM) ( expected e to be betrayed by the woman he loves)) w Ò £Õ Ò £Õ (5.11 b) LF: We ( only (HIM) ( w (5.11 a) LF: We ( A problem for this analysis is that focus movement in English is not governed by general movement constraints. In particular, it can violate island constraints, as shown (5.12), which cannot be violated by WH-constituents or quantifier movement, as shown (5.13)-(5.14) [Gar97a, Roo85]. (5.12) They investigated [the question of whether you know the woman who chaired THE ZONING BOARD] (5.13) *(Which board) did they investigate [the question of whether you know the woman who w chaired e ?] w (5.14) *They investigated [the question of whether you know the woman who chaired every board in town] (where every board in town scopes over the woman) In other research circles, focus is analyzed in terms of an additional partitioning of the clause, although the theoretical basis for this varies according to the background theory of the researcher. For example, as summarized by [vH00], [vdG69] introduces the pair psychological subject-predicate, viewing psychology is the ultimate basis for language structure, while The Prague School [Fir64] uses the terms theme-rheme and topic-comment, which are both borrowed from traditional rhetoric 233
    • and philology and are re-envisioned by the American Structuralists, who analyze of focus as expressing what is new in an utterance (see [Hal67], [SHP73]). In this view, the question Who did Mary invite for dinner? can be answered by Mary invited BILL for dinner because the answer retains the given information that Mary invited someone for dinner and supplies the new information that this person is Bill. Bill is thus accented and other constituents are de-accented. [Pri81] argues, however, that a binary distinction between the familiarity of information inadequately accounts for numerous discourse phenomena and proposes a five-way taxonomy. [Cho71] and [Jac72] rephrase the given-new distinction in terms of presupposition-focus, stressing their semantic-pragmatic nature. [Jac72] introduces the notion of focus as a semantic feature. Theories of “structured meanings” [Kri92, vS82] combine the focus movement approach with contemporary theories of information structure, reformulating the distinction in terms of backgroundfocus, and producing a semantic account of focus such that phrases differing in the location of focus have different semantic values. They assume that the presence of a focus feature causes the focused expression (F) to be moved out of its original position, leaving a trace at LF, which is interpreted as a variable. The background (B) consists of the remainder of the clause with a lambda abstraction over the variable left by the focus. This background corresponds to presupposed information that is given or can be accommodated, in the context, and is also related to what [Pri86] calls the open proposition of an utterance [Ste00a]. The focus constitutes the new information. In structured meaning theories, the result of a clause with a single focus is a structured meaning: a pair consisting of 1) a property (B) obtained by abstracting the focused position, 2) the semantics of the focused phrase. The property in (5.15 a), for example, is the property of being introduced by John to Sue, and is the individual denoted by Bill, yielding the structured meaning in (5.15 b). € (5.15 a) John introduced BILL to Sue. (5.15 b) ( x.[introduce(j,x,s)], b) is the individual denoted by Sue, yielding the structured meaning shown in (5.16 b). (5.16 a) John introduced Bill to SUE. (5.16 b) ( x.[introduce(j,b,x)], s) ß 234 @ ß The property in (5.16 a) is the property of being an x such that John introduced Bill to x, and
    • 5.2.3 Alternative Semantics Alternative semantics [Roo85, Roo92, Roo95a] also seeks to produce a semantic account of focus such that phrases differing in the location of focus have different semantic values. However this theory relies on a slightly different interpretation of focus than the partitioning effect represented in structured meaning theories, and consequently produces a wholly semantic analysis of the phenomena, which is performed in situ without any notion of focus movement. According to evoking alternative propositions is the general function of focus. For example, a question like Who did Mary invite for dinner? asks for answers of the form Mary invited X for dinner, where X varies over persons. The answer, Mary invited BILL for dinner, identifies a particular answer of this form. Focus on an expression is viewed as marking the fact that alternatives to this expression are under consideration2 . Thus the use of focus in alternative semantics is viewed not as distinguishing two parts of the clause (focus and background), but rather as a semantic feature that triggers the computation of an additional semantic value for the entire clause: an alternative set, of which the ordinary semantic value is a subset. During the interpretation process, the focus is left in situ, and the alternatives that are generated from the focused expression to yield additional semantic value are computed wholly semantically. The notion of alternative sets has been employed in a variety of other research, including [Bie01, KKW01b, Ste00a]. , 0 , indicated as # ¡ ¡ As shown in(5.17)-(5.18), the focus feature on each focused expression  triggers the assignment of both an ordinary semantic value [[.]] , and a focus semantic value [[.]] : the set of semantic objects obtainable from the ordinary semantic value by making a substitution in # ¡ the position corresponding to .  (5.17) [[mary likes sue]] = like(m,s) D , where D is the domain of truth values. 2 2 H 0 # (5.18) [[mary likes sue]] = like(x,s) for all x D , where D is the domain of individuals. & H # is the semantic type of , and ( ,...,  ¡ w ¡ ± ¡  resents the meaning of a lexical item, ¡ & More generally, focus semantic value is defined recursively as shown in (5.19), where rep- ) represents the meaning of a complex phrase such as a clause or complex verb phrase. [SCT 94] use eye-tracking techniques to observe the construction of alternative sets during sentence processing. Ñ 2 235
    • ]] = D , (x ,...,x ) x [[ ]]  ¡ ¡ w ¡ ± As shown in (5.19 a), the focus semantic value of a focused lexical item ¡ ± … )]] = w ,..., H % ¬  ¡ … [[ ( † 0 % ¡ 0 [[ ]] = [[ ]] 0 † # ¡ c. 0 (5.19) b. [[ & a. is the set of semantic objects of same semantic type as . As shown in (5.19 b), the focus semantic value of a non-focused ¡ lexical item is the singleton set of its ordinary semantic value. As shown in (5.19 c), the focus semantic value of a complex phrase is computed compositionally from the focus semantic values ,..., ) would represent the proposition  ¡ of its component lexical items. Thus in (5.18) above, ( w ¡ ± Mary likes Sue, and its focus semantic value would be the set of propositions of the form x likes # Sue, where x ranges over individuals. Restrictive alternative semantics [Roo92, Roo95a] has been introduced to handle cases where the alternative set that is used in the interpretation of focus is a subset of the focus semantic value of ‡F E EC G–5y† a syntactically covert (or ' a proposition. In restrictive alternative semantics, is used to represent this alternative set, denoting ) free semantic variable, which focus evokes in a presuppositional way. When focus is used, the focus feature is interpreted by the focus interpretation operator, ˜, which constrains C and thereby handles the interpretation of focus as defined in (5.20). is a syntactic phrase and C is a syntactically covert semantic variable,  0 ¡ (5.20) Where ˜ C introduces the presupposition that C is a subset of [[ ]] containing [[ ]] and at ¡ ¡ ¡ least one other element. Like other free variables, C must find a referent. Focus interpretation contributes a constraint, but does not fix this referent uniquely. In each specific case, C is identified with some semantic or pragmatic object that is present for independent reasons. Identifying the variable with the appropriate object is a matter of anaphora resolution [Roo95a]. In cases of question-answer congruence, an antecedent for C is introduced by the semantics and/or pragmatics of questions: if we view the ordinary semantic value of a question as a set of possible answers, i.e. a set of propositions corresponding to potential answers, both true and false, then the antecedent for C can be the ordinary semantic value of the question itself. 236
    • For example, the ordinary semantic value of the question Does Eva want tea or coffee?, is the set containing the propositions Eva wants tea and Eva wants coffee. If the answer is Eva wants coffee , then the constraint introduced by ˜ is that C be a set of propositions of the form Eva wants 0 y containing at least Eva wants coffee and something else. If the answer instead had focus on Eva, the constraint would be that C is a set of propositions of the form x wants coffee. However, in this latter case, C would be inconsistent with the information independently contributed by the question. 5.2.4 Backgrounds or Alternatives? Conceptually, a major difference between the structured meaning and alternative semantics approaches is the distinction between a background (or theme or presupposition) and a set of alternatives. The dialogue in (5.21) motivates the view that focus use partitions the clause into the background, which expresses given information (expressed in the question), and the focus, which gives new information (Fred). (5.22) and (5.23) motivate the view that focus yields the construction of alternative sets. In (5.22) there is no prior context at all; the focused expression is understood to express some contrast to other possible referents. In (5.23), these contrasting referents are explicit. (5.21) Speaker A: Who did Sam talk to? Speaker B: Sam talked to FRED. (5.22) Sam talked to FRED. Speaker A: Does Eda want tea or coffee? Speaker B: (5.23) Eda wants COFFEE. Both the structured meaning approach and the alternative semantics approach provide a single analysis of the above examples. In truth, we would like both analyses, focus-background and alternative sets, to play a role in a complete theory of focus interpretation. As evidence of this [Kri] presents the dialogue in (5.24). Speaker A: My car broke down. Speaker B: (5.24) What did you do? can answer with (1) I called a mechanic or with (2) I fixed it. If focus expresses newness, (1) ( should have focus on called a mechanic, and (2) should have focus just on fixed, as the car (i.e. the referent of it) is given. The lack of accent on it shows that givenness plays a role in accentuation; 237
    • pronouns are generally not accentable [Kri]. But the same use of focus can also indicate the presence ( of alternative actions could have taken when he answers with either (1) or (2). One would like to represent all of this information. As [Kri] notes, the processing power required for the construction of (context-dependent) alternative sets may make (restrictive) alternative semantics approaches less economical than structured meaning approaches, especially for cases in which multiple foci are involved, as in (5.25). (5.25) Mary only invited BILL for dinner. She also only invited BILL for LUNCH . w g [Roo95a] however gives a detailed comparison of structured meaning and alternative semantic approaches, concluding that the pros and cons of each theory are nearly equally balanced. For our purposes, one aspect of this difference between these two theories is most significant. Structured meaning approaches form the background by treating the focus feature as a semantic operator, which must scope over a bound variable, forming an abstracted property, or background (or presupposition or open proposition), which is then applied to the focused phrase. However, in this semantics, relevant variations in the range of entites other than the focused entitiy to which the property can apply are not distinguished. For example, in (5.16 a), the background is the property of John introducing Bill to someone. The relevant range of people other than Sue that John could have introduced Bill to is not distinguished from the set of all individuals. The equivalent of this abstracted property in alternative semantic approaches is the focus semantic value of a clause, i.e. an alternative set of propositions produced by substituting alternatives in the position corresponding to the focused phrase. Again, the relevant alternatives that could be substituted for the focused phrase are not distinguished from the set of all individuals. Restrictive alternative semantics, however, employs a free variable C whose reference to a relevant alternative set in the context is fixed via anaphora resolution, and is constrained only to be a subset of the focus semantic value of the clause containing the focused phrase. Thus, restrictive alternative semantic approaches allow the range of alternatives to the focused phrase to be dependent on the context. We will return to this dependency and demonstrate its effects on S-modifiers in Section 5.3. 238
    • 5.2.5 Contrastive Themes [Ste00a] makes use of both partitions and alternative sets in his combinatorial grammar-based analysis of focus, the aim of which is to produce a viable account of the syntax-phonology interface. In particular, [Ste00a] retains both theme-rheme and focus-background partitions, while incorporating the notion of “contextually-relevant” alternative sets. In Steedman’s analysis, the intonation patterns of an utterance establish a theme and a rheme. Essentially, the L+H* LH% tune (among others) is associated with a theme, and the H* LL% tune (among others) is associated with a rheme3 . His theme-rheme distinction corresponds roughly to the focus-background distinctions in structured meaning theories: themes convey presupposed information that is given in the prior context or can be accomodated, while rhemes convey new information. Steedman, however, represents the information presupposed by a theme in terms of an alternative set of propositions instantiated by different possible rhemes depending on the context, which he calls a “rheme alternative set”. Moreover, in his analysis, the intonation patterns used to establish themes and rhemes may additionally convey that there is given and new information within both the theme and the rheme, which he calls background and focus, respectively. In other words, within both theme and rheme, focused phrases can be used to distinguish them from other alternatives in the context. (5.26) from [Ste00a]) provides an example. (5.26) Q: I know that Marcel likes the man who wrote the musical. But who does he ADMIRE? A: ADMIRES the woman who L+H* LH% focus H* ) .............theme.............. background ) background DIRECTED focus the musical. LL% background ..............................rheme................................ ) Marcel ) Boundaries of the theme and the rheme of the answer in (5.26) are indicated by arrows; within 3 Steedman employs [PH90]’s tune notations; an exposition on prosodic tune representation and distinction will take us too far afield and will not be undertaken in this thesis; see [Ste00a] for a comprehensive discussion of the particular intonation patterns that can establish theme and rheme. 239
    • both this theme and this rheme there are background and focus consituents as indicated. These partitions are conveyed by the prosodic tunes distinguishing the capitalized (focused) phrases. The theme of the answer in (5.26) presupposes a rheme alternative set, a set of propositions of the form Marcel admired X, where X is instantiated by a variety of individuals made available in the context, including the man who wrote the musical and other people relevant to the perfomance of a musical. The rheme background in (5.26) includes the given information that a musical is under discussion, while the rheme focus restricts the rheme alternative set to one proposition. The theme in (5.26) is intonationally marked as also containing a focus and background. The theme background establishes the theme, while the theme focus presupposes a set of alternative themes of the form Marcel Y’d X, where X is instantiated by the rheme and Y is instantiated by different theme foci in the context, including likes and admires. Steedman’s analysis enables an account of a variety of different ways in which a theme can be established. In particular, a theme can be contrastive. For example, suppose the answer in (5.26) was Marcel HATES the woman who DIRECTED the musical, accompanied by the same intonation pattern as in (5.26). In this case, the same theme alternative set is constructed, but the speaker at one and the same time recognized the theme required by the question and establishes a new theme. 5.2.6 Summary In this section we have presented three analyses of focus as a semantic mechanism of discourse coherence. In particular, we have advocated the view that the prosodic highlighting of a syntactic constituent can cause the sentence in which it is contained to be interpreted with respect to the discourse, by invoking an alternative set whose members are interpreted with repect to the context. In the next sections, we will discuss how adverbial semantics can interact with focus semantics. 5.3 Focus Sensitivity of Modifiers In this section we address how analyses of the focus sensitivity of sub-clausal modifiers extend to one issue that has not been widely addressed in the focus literature: the effects of focus on S-modifying ADVP and PP adverbials. 240
    • 5.3.1 Focus Particles Certain syntactic expressions are “sensitive to focus”, in that their interpretation depends on the placement of focus, and are said to “associate with focus” [Jac72]. One group of adverbs whose sensitivity to focus has been widely investigated are called “focusing adverbs” or, more commonly, focus particles. To illustrate their focus sensitivity, we draw on examples from [Roo95a]. Consider a situation in which John has both read the book and saw the movie “War and Peace” but has read nothing else. In these circumstances, (5.27 a) below is false, but (5.27 b) is true. Contrast this situation with one in which John read both “War and Peace” and “Crime and Punishment”, but saw no movies of either. Then (5.27 b) is false, but (5.27 a) is true. (5.27 a) John only READ War and Peace. (5.27 b) John only read WAR AND PEACE. Since in each situation, the variants differ only in the location of focus, focus is viewed as having a truth conditional effect in the context of the adverb only [Roo95a]. With many other focus particles, the effect is said to be presuppositional [Roo95a]. In (5.28 a), for example, the use of also with focus introduces a presupposition that a proposition of the form John X’d War and Peace, where X is not read, is true. In (5.28 b), the position of the focus has changed, yielding a presupposition that a proposition of the form John read X, where X is not War and Peace, is true. (5.28 a) John also READ War and Peace. (5.28 b) John also read WAR AND PEACE. Similarly, in (5.29 a), the use of even with focus introduces a presupposition that a proposition of the form John X’d War and Peace, where X is not read, is true, while (5.29 b) introduces a presupposition that a proposition of the form John read X, where X is not War and Peace, is true. The use of even also conveys an additional presupposition along the lines that there is something unexpected about John’s reading War and Peace, where what is unexpected corresponds to the focused phrase. (5.29 a) John even READ War and Peace. 241
    • (5.29 b) John even read WAR AND PEACE. Generally, the difference between a presuppositional and a truth-conditional effect of focus is shown empirically by constructing two otherwise identical sentences which differ only in the location of focus. If a situation (i.e. a prior context) can be constructed such that the two sentences have different truth values given this situation, then focus has a truth-conditional effect. If no such situation can be constructed, but instead the context the two sentences seem to require differs, then focus has a presuppositional effect ([Roo95a]). All of the focus particles mentioned above are further claimed to presuppose the truth of their containing clause. Whether or not all of these “presuppositions” are semantic presuppositions is a matter of some debate. As noted by [Bie01], for example, [KP79] argue that the presuppositions of focus particles are due to conventional implicature [Gri75] rather than semantic presupposition. We will discuss the difference between presupposition and implicature in Section 5.4. In both structured meaning and alternative semantics approaches, however, the truth of the containing clause is represented as a semantic presupposition. In Structured Meaning approaches, focus particles are viewed as operators which take the focus and the background, i.e., a structured meaning, as their argument. The meaning of only, for example, is defined by [Hor69] as in (5.30): w u ¨ ¨ e w e w ... ¨ ¨ f¡ w ¡ the presupposition: R( (x ...x ) = ( ... ¨ $¡ w ¡ the assertion: x ... x [R(x ...x ) ... ¨ f¡ w ¡ (5.30) only combining with the structured meaning (R, ) yields: )] ) is true. In (5.31 a), for example, there is only one focused phrase: BILL, and so the background of its structured meaning in (5.31 b) is a one-place relation. Thus the semantic value of the clause shown in (5.31 c)- (5.31 d) is an assertion that John introduced nobody other than Bill to Sue, and a presupposition that John introduced Bill to Sue. (5.31 a) John only introduced BILL to Sue. (5.31 b) structured meaning: ( x.[introduce(j,x,s)], b) ß u (5.31 c) assertion: x[introduce(j,x,s) (j,x,s)=(j,b,s)] e (5.31 d) presupposition: [introduce(j,b,s) presupposed] 242
    • In (5.32 a), there are two focused phrases, and so R has a corresponding number of arguments. (5.32 a) John only introduced BILL to SUE. (5.32 b) structured meaning: ( x y.[introduce(j,x,y)], b,s) ß ß (j,x,y)=(j,b,s)] u (5.32 c) assertion: x y[introduce(j,x,y) e e (5.32 d) presupposition: [introduce(j,b,s)] [Kri92] further proposes semantic values within the Structured Meaning framework for a variety of complex focus structures, including multiple focus particles with nested or independent foci. In Alternative Semantics, focus particles are assigned a lexical semantic value which quantifies over propositions. For example, the focus particle only is defined by [Roo85] as in (5.33), where p is a universally quantified proposition variable4 . yields:  a p = [[ ]] ] ± a 0 the assertion: p[p [[ ]] & p ± (5.33) only combining with a clause 0 ± H e a In Montague’s intensional logic, p is understood as meaning that p is true; evaluates a proposition at the current index, and therefore combines the presupposition and assertion. This rule is different from its Structured Meaning counterpart in that the quantification is at the   level of propositions: no alternative to [[ ]] is both distinct from [[ ]] and true. ± ± As noted in Section 5.2, restrictive alternative semantics has been introduced to handle cases where the alternative set that is used in the interpretation of focus is a subset of the focus semantic value of a proposition. For example, in the sentence below, the alternative set consists of just three propositions, rather than the full set of propositions of the form John introduced X to Sue (i.e. the full focus semantic value). John brought Tom, Bill, and Harry to the party, but he only introduced Bill to Sue. 0 In restrictive alternative semantics, focus particles are still assigned a lexical semantic value which quantifies propositions; however, their domain of quantification is no longer the focus semantic value of the proposition, but instead is the covert free variable, C introduced by the focus ë 93 !@87 E 53 1 642ë D 243 and 1 A B C§CA 4 The difference between , whose focus sensitivity yields a truth-conditional effect, and sensitivity yields a presuppostional effect, can be represented semantically using and . , whose
    • interpretation operator, whose referent set of alternative phrases is fixed by context, constrained only to be a subset of the focus semantic value of the proposition. The focus particle only is now defined [Roo85] as in (5.34), where p is a universally quantified proposition variable. p = [[ ]] ] ± 0 a H e 5.3.2 yields:  the assertion: p[p C & p ± (5.34) only combining with a clause Other Focus Sensitive Sub-Clausal Modifiers Although also, only, even are the most commonly studied focus particles, the literature yields a variety of additional phrases categorized as focus particles (see [web, Kon91, QGLS85]). [Kon91] bases these inclusions on the fact that all phrases categorized as focus particles share certain properties. Syntactically, for example, focus particles can occur in a variety of positions in the sentence. Some examples of this positional variation are shown in (5.35). As shown, focus particles often immediately precede (or follow) the focused phrase, and their felicituous use in other positions appears to be dependent on the position of the focused phrase. However, the exact nature of this dependency is still an open question; it varies with respect to each lexical item and is to some extent dialect-specific. For example, many readers will find the last three examples (j-l) awkward or even infelicituous if even is replaced with only, and some readers may prefer even in these examples to be replaced by stressed also. ´ (5.35 a) Even JOHN showed the painting to Mary. (5.35 b) JOHN even showed the painting to Mary. (5.35 c) John even SHOWED the painting to Mary. (5.35 d) John even showed THE PAINTING to Mary. (5.35 e) John even showed the painting to MARY. (5.35 f) John even SHOWED THE PAINTING to Mary. (5.35 g) John showed even THE PAINTING to Mary. (5.35 h) John showed the painting even to MARY. (5.35 i) John showed the painting to MARY, even. 244
    • (5.35 j) John showed THE PAINTING to Mary, even. (5.35 k) John SHOWED the painting to Mary, even. (5.35 l) JOHN showed the painting to Mary, even. Semantically, focus particles share the property of being either additive or restrictive (or the related terms inclusive or exclusive) with respect to the way they associate the proposition containing the focused phrase to the other propositions in the alternative set (or alternatively, the way they associate the focused phrase with the background). For example, only is restrictive, because as noted in the prior section it asserts that the proposition containing the focus phrase is the one true proposition in the alternative set. In contrast, even and also are additive, presupposing at least one other true proposition in the alternative set. In addition, focus particles may order (or scale) the presupposed alternatives, and within this ordering, evaluate the position of the alternative contianing the focused phrase. For example, even conveys that the alternative proposition containing the focused phrase is less likely than the other alternative propositions within the alternative set. On the basis of such properties, [web, Kon91, QGLS85] include a variety of additional ADVP and PP in their lists of focus particles. While many of these lexical items have multiple meanings, at least one can be read as additive or restrictive with respect to its interaction with an alternative set in a focus construction5 . For example, the lexical items in (5.36), though more restricted positionally, are very similar in meaning to also, and are often categorized as focus particles. (5.36) as well, in addition, too It is not surprising that these lexical items display the same presuppositional sensitivity to focus as does also, as shown by the examples in (5.37). (5.37 a) introduces a presupposition that a proposition of the form John X’d War and Peace, where X is not read, is true. (5.37 b) introduces a presupposition that a proposition of the form John read X, where X is not War and Peace, is true. (5.37 a) John READ War and Peace, in addition/as well/too. (5.37 b) John read WAR AND PEACE, in addition/as well/too. 5 See [Kon91] for a discussion of the idiosyncratic meaning and positional variations of other focus particles. 245
    • [Kon91] also categorizes the lexical items shown in (5.38) as additive focus particles, though they too are more restricted positionally. (5.38) likewise, similarly, so much as Konig argues that the first two items in (5.38) do not induce an ordering of the set of alternatives, as exemplified in (5.39), and convey a meaning akin to, albeit slightly richer than, also. The last item in (5.38) he treats akin to even, in that it induces an ordering of the set of alternatives and evaluates the alternative containing the focus with respect to this ordering, as exemplified in (5.40). (5.39) JOHN likewise/similarly saw the movie. (5.40) I doubt John will so much as GREET Mary. [QGLS85] distinguishes two types of restrictive focus particles. The first type of restrictive focus particles are similar in meaning to only, as shown in (5.41). (5.41) but, exclusively, just, merely, purely, simply, solely It is thus not surprising that these lexical items display the same truth-conditional sensitivity to focus that only does, as exemplified in (5.42). If John both read the book and saw the movie War and Peace but has read nothing else, (5.42 a) is false, but (5.42 b) is true. If John read both War and Peace and Crime and Punishment, but saw no movies of either, these truth values are reversed. (5.42 a) John but/just/solely READ War and Peace. (5.42 b) John but/just/solely read WAR AND PEACE. The second set [QGLS85] categorizes as restrictive focus particles are shown in (5.43). (5.43) at least, chiefly, especially, in particular, largely, mainly, mostly, notably, particularly, primarily, principally, specifically [QGLS85, 604] view these lexical items as restrictive because they “restrict the application of the utterance predominantly to the part focused”. In contrast, however, [Kon91] argues that this predominance does not make these lexical items (which he calls particularisers) restrictive, but rather indicates that they induce an ordering on the alternatives and evaluate the proposition containing the focused phrase with respect to that ordering, akin to the additive focus particle, even. 246
    • [Kon91, 96-97] thus argues that these lexical items are likely additive particles, and states that they “clearly” presuppose the truth of other propositions in the alternative set. For example, (5.44) appears to imply that other people besides young people are also susceptible to peer pressure. (5.44) YOUNG PEOPLE in particular are susceptible to peer pressure. However, we find that this “additive quality” is not clearly apparent in all of the lexical items in (5.43). For example, at least, exemplified in (5.45), does not clearly presuppose the truth of another proposition in the alternative set (that John likes X where X is not Mary), nor does it assert that no other proposition in the set is true, although it does induce an ordering of the elements in the set and evaluate the proposition containing the focused phrase with respect to that ordering. (5.45) John at least likes MARY. In Chapter 3 (Table 72) we saw that many of these adverbials can also modify S. In their Sinternal use, they have been called “focus particles” because the interpretation of the alternative set they presuppose appears to depend upon the presence of focus. Semantically, the alternative set these particles presuppose is equated with the alternative set introduced by the interpretation of focus, which contains the proposition containing the focused phrase as well as alternative propositions in which the focused phrase is replaced by other entities of like semantic type. Indeed, S-internal uses of focus particles in constructions without focus have not even been considered in much of the literature, which implies that they are infelicituous or at least highly restricted. If such cases do exist, it is not clear how the alternative set they presuppose is interpreted, because there is no alternative set introduced by focus interpretation for it to be equated with. Of course, if the focus particle is itself focused, it can refer to a different set than the set invoked by focus on the phrase it modifies, as shown in (5.46). (5.46 i) Speaker A: Does Eva want tea or coffee at lunch? (5.46 ii) Speaker B: She only wants COFFEE at lunch. (5.46 iii) Speaker B: She ONLY wants COFFEE at lunch. If B answers with (ii), he is likely answering the question posed by Speaker A. As noted above, the question provides an antecedent for the alternative set that the interpretation of the focused 247
    • element COFFEE introduces, namely, a subset of the focus semantic value of the clause containing the focus. This set is equated with the alternative set that only presupposes. If B answers with (iii), however, he may both answer the question and supply additional information. For example, B may indicate by answering with (iii) both that Eva wants coffee and that coffee is all she wants for lunch. The latter alternative set, introduced by the focused on only, appears to include propositions such as Eva also wants coffee at lunch, which itself presupposes (via also) that Eva is having something else for lunch. Interestingly, however, not all lexical items that presuppose an alternative set require the presence of focus to interpret that alternative set. For example, the alternative phrases discussed in Chapter 3, including other, such are defined in [Bie01, WJSK03] in terms of alternative sets. The form other X, for example, refers to the result of excluding an entity or set of entities from a contextually relevant alternative set. Thus, other dogs in (5.47) refers to the set of dogs resulting from excluding one or more dogs in the discourse context from a larger presupposed set of alternative dogs. In contrast, such dogs in (5.48) refers to the set of dogs resulting from using one or more dogs in the discourse context as an example of a presupposed set of dogs. (5.47) John likes other dogs. (5.48) John likes such dogs. In other words, other is a restrictive phrase, akin to only6 , and such is an additive or inclusive phrase, akin to also, with respect to the way they associate the proposition containing the modifier to other propositions in the alternative set. However, the notion of focus is not invoked; there need not be a focused phrase in these constructions to determine the set of alternatives under consideration. One wonders why these phrases are not focus particles, such that their interpretation is dependent on or at least sensitive to the presence of focus. Focus and focus particles can be present, as exemplified by the discourse in (5.49). (5.49) Both John and Mary hate Sally’s cats and dogs. But while Mary likes other cats and dogs, John only likes other DOGS. 6 As Bonnie Webber points out, however, other is not exclusive; it can be additive, as shown by the felicity of the use of as well in the following discourse: “Mike only likes poodles, but John likes other dogs as well.” 248
    • In this example, the use of focus on dogs invokes a set of alternative propositions of the form John likes other X. The prior context supplies one particular alternative instantiation of X, namely the cats which Mary likes. In contrast, the use of other invokes an alternative set whose propositions assert the existence of alternative dogs7 ; again the context supplies some particular alternatives, namely Sally’s dogs. One reason, therefore, that Bierner’s alternative phrases are not focus-sensitive in the same way as the focus particles is that the set of alternatives invoked by the use of an alternative phrase do not contain any of the same elements that are contained in the set of alternatives that is invoked by the use of focus or a focus particle. Of course, as shown in (5.50), we can also focus other itself. (5.50) Mike likes poodles. John only likes OTHER dogs. In this case, the alternative set associated with the semantics of other contains propositions asserting the existence of contextually relevant dogs (e.g. the poodles that Mike likes), and this set also contains the “other dogs” themselves, e.g. the subset of dogs that result from excluding poodles from the set of all (salient) dogs. In contrast, the alternative set associated with the focus on other is the set of propositions of the form John likes X dogs, where values for X may include determiners and quantifers such as this, all, a, some, etc, as well as comparatives such as bigger, smaller, faster, etc. The semantics of only requires that these other propositions are false. 5.3.3 S-Modifying “Focus Particles” As noted above, the label “focus particle” has been applied to only S-internal uses of focus sensitive modifiers; their sensitivity to focus in their use as S-modifiers has not been addressed. Moreover, both structured meaning and alternative semantics approaches have defined the semantics of focus particles wholly in relation to a focused phrase. However, by defining the semantics of focus particles in relation to the free variable C (introduced by the focus interpretation operator) that can resolve to a context-dependent alternative set, restrictive alternative semantics can more easily be 7 In Bierner’s semantics, the alternatives are represented as propositions which assert the existence of an entity. 249
    • extended to account for S-modifying, discourse adverbial uses of these adverbs8 . Regardless of the syntactic constituent to which they adjoin, all of the adverbials discussed above presuppose an alternative set of propositions. We thus take the position advocated by [Ern84] and discussed in Chapter 3, that ideally the same semantics should be retained for a given modifier whenever its meaning does not change despite its positional variability. In other words, instead of postulating “homonyms”, the syntactic consitituent corresponding to the external argument in a given semantics should simply be allowed to vary. Note however that much like we saw in Chapter 3 in clause-level adverbial research, it is often the case that focus particle researchers (see [Kon91]) call S-modifying uses of focus particles “conjunctive” or “discourse connective” and exclude them from their analysis. Our position is nevertheless applicable to most of the focus particles discussed above, because, again, regardless of the syntactic constituent they modify, they presuppose an alternative set of propositions. Some require that at least two members of this alternative set be true (e.g. even, also), while others require that only one member of this set be true (e.g. only, solely). Moreover, they can display the same focus sensitivity as S-modifiers that they do as (S-internal) focus particles; however, as S-modifiers, their interpretation is not dependent on the presence of a focused phrase. Consider examples (5.51 a)-(5.52 a), which contain no single focused phrase. (5.51 a) The Mets won the world series. It even rained in Lima. (5.52 a) The Mets won the world series. It also rained in Lima. [Roo95b, p.17] cites example (5.51 a) as a “direction for investigation”, suggesting that its interpretation indicates that it raining in Lima is more improbable than the Mets winning the world series. In Rooth’s view, this example shows that the alternative presupposed by even must be allowed to resolve to the interpretation of the clause The Mets won the world series even in the absence of a focused phrase. In other words, both propositions are contained and ordered in some presupposed alternative set. Although the focus particles in examples (5.51 a)-(5.52 a) are syntactically S-internal, semanti8 This is not to say that this approach is sufficient, or that structured meaning approaches could not also be extended to take these observations into account. 250
    • cally the subject it in these examples is an expletive, used in English because a subject must always be present in declaratives9 . Semantically, moreoever, these focus particles are functioning as discourse adverbials, relating the AO interpretation of the clause they modify to the AO interpretation of a prior clause, by presupposing an alternative set that contains them both. In (5.52 a), for example, the set presupposed by also can be interpreted as containing the event of the Mets winning the world series and the event of it raining in Lima; the resulting interpretation of the discourse may simply be that a set of events occurred. This same interpretation is achieved if also is S-initial and S-modifying, as shown in (5.52 b). (5.52 b) The Mets won the world series. Also, it rained in Lima. Moreover, as noted in Chapter 3, there are 48 naturally occurring instances of S-initial Smodifying also found in our corpus; one example is shown in (5.52 c). Again, there is no apparent focused phrase. The set presupposed by also contains the performances of a variety of different stocks, found as the abstract object interpretations of the four clauses constituting the first sentence. The two instances of S-initial S-modifying even in our corpus are actually mis-parsed NP-modifiers, however, and a search of the raw data in our corpus produced no examples of S-final even. (5.52 c) Pfizer gained 1 7/8 to 67 5/8, Schering-Plough added 2 1/4 to 75 3/4, Eli Lilly rose 1 3/8 to 62 1/8 and Upjohn firmed 3/4 to 38. Also, SmithKline Beecham rose 1 3/8 to 39 1/2. (WSJ) Similarly, in (5.51 a), the set presupposed by even can be interpreted as containing ordered improbable events. Although S-initial S-modifying uses of even appear to be infelicituous, we can achieve the same interpretation in (5.51 b) using S-final S-modifying uses of even. (5.51 b) The Mets won the world series. It rained in Lima, even. If we replace even in (5.51 a) with only, as in (5.53 a), we find that we cannot interpret the set presupposed by only as containing both the event of the Mets winning the world series and the event of it raining in Lima, because the semantics of only asserts that only one proposition in its presupposed alternative set is true. (5.53 a) The Mets won the world series. It only rained in Lima. 9 Although in speech this “rule” does not always hold. 251
    • The alternative set presupposed by only in this case contains a single abstract object. Unless the speaker is implicitly denying the truth of the first sentence, its AO interpretation cannot thus be included in this alternative set. We might therefore simply interpret this discourse as a set of unconnected events. Only on such an interpretation would simply presuppose that in no other locations did it rain. On the other hand, it appears that, on another interpretation, only in (5.53 a) can make use of an inferred discourse relation between the two sentences when selecting its contextually relevant set of alternatives. For example, we might infer a causal relation between the two clauses (which we can also make explicit: The Mets won the world series because it only rained in Lima.). Then the interpretation of it only rained in Lima produces the alternative set containing all contextually relevant locations where it rained, namely New York, and asserts that it did not rain in these locations, yielding the interpretation that the Mets won the world series because it did not rain in New York. Notice however that although only can appear felicituously as an S-initial S-modifier, we find that a different (and difficult to understand) interpretation results in (5.53 b). (5.53 b) The Mets won the world series. Only, it rained in Lima. This peculiarity of S-initial, S-modifying, discourse adverbial uses of only has not been remarked on in the literature that we have investigated. A number of such instances that aren’t difficult to understand are nevertheless found in our corpus, exemplified in (5.54). (5.54 a) His real name’s DiMaggio, only we call him Maggie because he has to take tranquilizers. (Brown) (5.54 c) There might have been a pool of cool water behind any of these tree-clumps: only – there was not. It might have rained, any time; only – it did not. (Brown) Only in all of these cases is paraphrasable by except that. The alternative phrase except for X is discussed in [Bie01]; except is treated as excluding X from a set made available in the containing sentence (e.g. animals in (5.55 a).This analysis extends naturally to the subordinating conjunction except that shown in (5.55 b) and to its adverbial counterpart except shown in (5.55 c), although as discussed below, the question of how AOs (and/or relations between them) make available alterna- 252
    • tive sets of AOs is still an open question. (5.55 a) Except for dogs, Mary hates animals. (5.55 b) Overall, Mary is an animal-hater, except that she loves dogs. (5.55 c) Overall, Mary is an animal-hater. Except she loves dogs. It may be possible to use the semantics of the focus particle only to account for its homonymous uses in (5.54)10 . The clearest case that supports this analysis is (5.54 c), where the modal might makes available a set of alternatives, e.g. it rained, it didn’t rain, and only asserts that only one of these alternatives is true. As noted in [WJSK03, KKW01b], however, all the ways in which an alternative set of AOs can be derived from a sentence or discourse unit are not yet known; we are just beginning to consider AOs as units of interpretation at all. Moreover, it might turn out that these uses of only are best represented as signalling a causal relation akin to but, with the additional feature that the interpretation of the modified proposition is the single fact blocking the normal consequence of the interpretation of the first sentence11 . To illustrate this analysis, consider a modal variant of (5.53 b), shown below in (5.53 c). The interpretation of this discourse is that the single (relevant) fact blocking the Mets winning the world series is the fact that it rained in Lima. A complete understanding the idiosyncratic lexical semantics and resolution properties of all discourse adverbials, including only, requires an annotation study such as described in Chapter 4. (5.53 c) The Mets might/could have the world series. Only, it rained in Lima. So far we have noted only that focus need not be present on a particular phrase in clauses containing discourse adverbial uses of “focus particles”, in order for their presupposed alternative set to be interpreted. Importantly, however, Rooth notes that there is a question of whether there is no focus or whether the entire clause is focused (in (5.51 a)). Certainly, if one advocates the theory of information structure, then there is an information structure to every sentence, although as noted in Section 5.2 our understanding of which prosodic tunes indicate theme and rheme is not complete, and moreover, in many cases information structure is unmarked prosodically [Ste00a]. For example, 10 The analysis would procede similarly to DLTAG’s analysis of otherwise ([WJSK03, KKW01b]) and instead ([CFM 03]). 11 This possibility was suggested by Bonnie Webber, personal communication. Ñ 253
    • as discussed in [Ste00a], a sentence may be all theme, if it repeats a prior sentence, or all rheme, as in the answer to the question in (5.56) (from [Ste00a]). In such a case, the rheme alternative set is the set of all (contextually relevant) propositions. (5.56) Guess what? Marcell proved COMPLETENESS! Whether or not they are marked prosodically, the clauses modified by discourse adverbial uses of focus particles exemplified above could be analysed as “all-rheme”. This in fact is the impact of what Rooth considers [Roo95a] when he says that the entire clause might be focused. 5.3.4 Focus Sensivity of S-Modifying Adverbials However, S-modifying adverbials can also be sensitive to the presence of focus on a sub-clausal phrase. For example, “adverbs of quantification” have been discussed in [Roo85] as displaying sensitivity to focus, as exemplified in (5.57). As Rooth notes, a bank clerk escorting a ballerina would make (5.57 a) false, but not (5.57 b). An officer escorting a bank clerk would run make (5.57 b) false, but not (5.57 a). (5.57 a) OFFICERS always escorted ballerinas. (5.57 b) Officers always escorted BALLERINAS. Although the specific semantics for each adverb depends on the adverb (e.g. always, sometimes, usually, etc.), [Roo85] and subsequent Rooth papers generally argue the following: r Adverbs of Quantification denote a relation between sets of events, and combine compositionally with an S. r The S with which an Adverb of Quantification combines denotes a temporal abstract, a set of events which fills the second argument (scope) of the adverb. r The first argument (restriction) is a free context variable C over sets of events. r Focus contributes to fixing the value of C. In Rooth’s analysis, the focus interpretation operator constrains the alternative set (C) presup- posed by adverbs of quantification to be a subset of the focus semantic value of S. In the case of 254
    • (5.58 a), this value will be (5.58 b), which Rooth calls a ‘focus closure’, the set of events of Mary taking someone to the movies12 . (5.58 a) Usually, Mary takes JOHN to the movies. † (5.58 b) t y[AT(t, [ Mary take y to the movies])] ¾ ¯h¬ … The interpretation of (5.58 a) is then: most events of Mary taking someone to the movies are events of Mary taking John to the movies. Essentially, we expect that all S-modifying adverbials that presuppose an alternative set of propositions will be sensitive to the presence of focus, because the set presupposed by the adverbial and the set presupposed by focus will contain at least one identical object, namely, the interpretation of the sentence being modified. For example, the alternative sets of both adverbs of quantification and focus contain the modified clause. In contrast, the alternative set presupposed by alternative phrases does not contain at least one object identical to an object in the set presupposed by focus, and so their two sets cannot be equated. This expectation that when two sets contain similar objects it will be difficult to interpret them as two different sets is captured in the AI planning heuristic “use existing objects” [Sac77], i.e. if focus interpretation makes available an alternative set, we will equate other similar presupposed alternative sets to this set if possible. In fact, this author has not come up with a case in which, in the presence of focus and an alternative-set presupposing S-modifier, the two sets they presuppose are not in some sort of set-subset relation. It appears that the set evoked by the modifier is always related to the set evoked by focus. For example, consider (5.57 c). (5.58 c) Mary is a great grandmother. Every Tuesday she does something fun with each grandchild. Usually, she takes JOHN to the movies. It appears that we interpret the presupposed set of events associated with usually not as the set of events of Mary taking X to the movies, but rather, either as the set of events of Mary doing something fun with her grandchild John (e.g. Mary takes John to the record store, Mary takes John to the park, etc.), or to the set of events of Mary doing something fun with each of her grandchildren (e.g. Mary 12 The formula uses the AT operator from tense logic (see [Dow79]), whose syntax is consistent with the event logic discussed in [Kam79] ([Roo85]). 255
    • takes Susie to the record store, Mary takes Eva to the park, etc.). However, it also appears that when we produce either of these interpretations, we are also prosodically focusing movies, so that this focus is also producing an alternative set in which the entire verb phrase varies. Further study might indicate that there is some tailored context in which it is possible, however, to distinguish the set presupposed by an adverbial from the set presupposed by a focused phrase, without the presence of additional focus. S-modifying adverbials that don’t presuppose alternative sets can also be sensitive to the presence of focus. For example, [KKW01a] show that the interpretation of the discourse connective although is sensitive to the presence of focus because the consequent of the defeasible rule it presupposes and denies (see [Lag98]) will be made available by the rheme alternative set. For example, (5.59 a) presupposes and denies the expectation that in normal circumstances, if Clyde marries Bertha, he will inherit some money, represented as in (5.59 b). The rheme alternative set of the main clause includes a variety of alternatives, including Clyde inheriting all the money, some of the money, and no money, as exemplified in (5.59 c). (5.59 a) Although Clyde married Bertha, he did not inherit a PENNY. inherit(c, money) inherit(c, money), inherit(c, money), ... † x … (5.59 c) t (5.59 b) marry(c, b) Evaluative S-modifying adverbs have also been argued ([Kri, Kon91]) to be sensitive to the presence of focus. As Krifka states it, (5.60 a) presupposes that there is some alternative X such that it would have been more fortunate for Mary to invite X for dinner. (5.60 b) presupposes that there is some alternative X such that it would have been more fortunate for Mary to invite Bill to X. (5.60 a) Unfortunately Mary invited BILL to dinner. (5.60 b) Unfortunately Mary invited Bill to DINNER. This presupposition should not be attributed to the semantics of unfortunately, however; rather, it arises due to the presence of alternatives created by the use of focus. Notice that Krifka states the presupposition as “it would have been more fortunate” if another alternatives had occurred. In other words, the presupposition isn’t presupposed true; we have no sense of whether or not Mary 256
    • invited anyone else to any other meal. The focus simply makes these alternatives available. The fact that these alternatives are perceived as being “more fortunate” arises simply because the alternative set is interpreted not as containing propositions of the form Unfortunately Mary invited X to dinner and Unfortunately Mary invited Bill to X, respectively, but rather their unmodified versions, Mary invited X to dinner and Mary invited Bill to X. That the alternatives are not unfortunate follows; that they are more fortunate than something unfortunate also follows. It is not always the case that an S-modifier is excluded from the propositions in an alternative set invoked by a focused phrase, however. For example, the set of alternatives created by the use of focus in (5.61) appears to be the set of alternative actions John might have taken after getting up; in other words, the relation supplied by then between all alternatives and the prior clause is maintained. (5.61) John got up. Then he LEFT. We see the same effect in other clausal adverbials, including the epistemics, as exemplified in (5.62). Again the set of alternatives created by the use of focus appears to be the set of alternative propositions of the form Probably I saw X, rather than I saw X. In other words, the relation supplied by probably to all the alternatives is maintained. (5.62) Probably I saw MARY. Moreover, it is not the case that all evaluative adverbs are intuitively excluded from the propositions in the set of alternatives. For example, it is easy to imagine a situation in which the set of alternative propositions introduced by focus in (5.63) would have been similarly not surprising. (5.63) Not surprisingly, I saw MARY. On the other hand, the evaluative adverbial by mistake appears to pattern like unfortunately. The alternatives of (5.64) have the form Mary invited X to dinner, rather than By mistake, Mary invited X to dinner. (5.64) By mistake, Mary invited BILL to dinner. It would be interesting to study whether or not it is predictable from the meaning of the adverbial whether the alternatives introduced by focus will include the adverbial or not. 257
    • More generally, there is a significant amount of research on the interaction of focus and discourse connectives; [KKW01b], for example, discuss the effect of focus on the resolution of the anaphoric argument of otherwise, [Umb02] discusses the effect of focus in conjuncted clauses, and [Joh95] discusses the use of focus in when-clauses. 5.3.5 Focusing S-Modifying Adverbials to Evoke Context Focus effects such as those discussed above can also interact with the semantics of clausal adverbials to evoke the prior context. An example is shown in (5.65). Following [Ste00a], the clausal adverbial on December 31 is the rheme and rheme focus, and his policy will expire is the theme. This theme presupposes a rheme alternative set of the form X, his policy expires, where X ranges over temporal coordinates. One alternative temporal coordinate can be derived from the S Michael’s courses are just about to end. Q: Michael’s courses are just about to end. When will his insurance end? A: (5.65) ON DECEMBER 31, his policy will expire. A clausal adverbial can also be a theme focus, as exemplified in (5.66). Prosody on On February 14 indicates the presence of a theme alternative set of the form Y, I’ll be at location X, where Y ranges over temporal coordinates. Alternatives are found in the context, including tonight and the set of times quantified by often in the clause often you travel. Q: Often you travel, and I don’t know if you’ll be home for dinner. For example, I know that you’ll be home tonight, but where will you be (5.66) on FEBRUARY 14? A: ON FEBRUARY 14, I’ll be in HOUSTON. A clausal adverbial can also be a contrastive theme, as exemplified in (5.67)-(5.68). In (5.67), the reply does not answer the question fully, rather it asserts what the answer is likely to be. The rheme focus, the movies, supplies the answer to what, restricting the rheme alternative set of the form we will X tomorrow. The theme focus, probably, indicates the presence of alternative themes, namely an alternative set of propositions of the form Y, we will X tomorrow, where Y ranges over possible epistemic values (e.g. probably, definitely, possibly, etc). 258
    • Q: What will we do tomorrow? A: (5.67) PROBABLY, we will go to THE MOVIES tomorrow. Similarly, in (5.68), the reply does not answer the question fully, rather an opinion of what the answer should be is asserted. The rheme focus, flat, restricts the presupposed set of alternative rhemes of the form Congress should decide on a X tax. The theme focus, my opinion, indicates the presence of alternative themes of the form In Y’s opinion, Congress should decide on a X tax, where Y ranges over contextually relevant individuals. Q: (5.68) I know what kind of tax Fred thinks Congress should implement, but what kind of tax do you think Congress should implement? A: In MY VIEW, Congress should implement a FLAT tax.   DLTAG (see [FMP 01, FW02]) has already argued that nowadays functions as a contrastive theme in examples such as (5.69), where what the husband does nowadays contrasts with what he used to do, or will do in the future. Q: What does your husband do? A: (5.69) NOWADAYS, he takes care of the kids.. Furthermore, we find corpus examples, such as (5.70), which show that nowadays as a theme can contrast with an alternative derived from the VP used to. (5.70) To write a play, the dramatist used to draw on his imagination and knowledge of life. Nowadays, all he draws on someone else’s book. (simplified BROWN example) [Ste00a]’s analysis of the syntax-phonology interface makes use of a range of specific prosodic tunes that have been associated with sub-clausal constituents when they function as theme and rheme. In order to study how prosody effects the discourse coherence of S-modifying adverbials, we must first understand the range of intonation patterns that can be associated with them. [AC74], for example, study the appropriate intonation patterns for British English S-modifying adverbs. It may be the case that there is some specific tune associated with all S-initial, S-modifying adverbials, due to their being located in topic position. Unfortunately, the variety of prosodic theories in the literature (see [PH90]) have not yet yielded reliable annotation of speech corpora, making it difficult to study prosodic effects based on something other than intuition. Moreover, in text corpora, the 259
    • writer’s prosodic intentions are lost; interpretation of prosody relies wholly on intuition. In addition to being focused or containing focused elements, another way which focus effects can be employed to cause a clausal adverbial to evoke context is through the use of a focus particle modifier. In our ADVP corpus, approximately sixty tokens are found in which a focus particle is modifying an ADVP adverbial, and in our PP corpus, approximately forty tokens are found in which a focus particle is modifying a PP adverbial. In the majority of cases in both corpora, the focus particle modifier is even. Some examples are shown in Table 5.1. As shown, some of these adverbials are already anaphoric or comparative and so may already depend on abstract objects in the discourse context for their interpretation. Although, as stated above, these examples are drawn from text corpora and it is thus not possible to determine their intonation patterns, that focus particles and focus can cause a clausal adverbial to require the discourse context for its interpretation is exemplified in (5.71), using one of the tokens from Table 5.1. In (5.71), there is rheme focus on grandparent, and theme focus on normal experience. Alternatives to the rheme focus are found in the context as friends and neighbors, while alternatives to the theme focus are derived from the relative clause whose lives are surrounded by crime. (5.71) Children whose lives are surrounded by crime may frequently face the death of friends and neighbors. However, even in NORMAL EXPERIENCE, many children are likely to face the death of a GRANDPARENT. Table 5.1: ADVP/PP Adverbials with Focus Particle Modifier # 4 18 3 1 1 1 ADVP Adverbial even now even so even then only a decade ago only hours earlier even more remarkably # 1 1 1 1 1 1 PP Adverbial also unlike Mr.Ruder even at a car ’s length even in normal experience even in that even on his tough constitution even without deals Interestingly, while our corpus reveals that focus particles are frequently found modifying subordinating conjunctions (e.g. even/only if/when), they do not appear to be found as frequently in our corpus modifying the most common discourse adverbials that take hidden semantic arguments. For 260
    • example, even/only as a result/in addition all sound odd, and do not appear in our corpus. Moreover, we find that then in (5.72 a) cannot be interpreted as a sequence relation between the event interpretations of the two sentences, akin to after that in (5.72 b). It can only be interpreted as a discourse deictic or anaphoric reference to the temporal coordinate of the first sentence, akin to at that time in (5.72 c)13 . However, even after that and even at that time in (5.72 b)-(5.72 c) respectively, are felicitous. A full understanding of how focus particles combine with discourse adverbials requires further study, and may turn out to be best analyzed using a gradient notion of processing difficulty. (5.72 a) John won’t wake up until late in the afternoon. Even then he will eat breakfast. (5.72 b) John won’t wake up until late in the afternoon. Even after that he will eat breakfast. (5.72 c) John won’t wake up until late in the afternoon. Even at that time he will eat breakfast. 5.3.6 Summary In the prior section we presented analyses that address how the prosodic focus on a sub-clausal constituent can effect the interpretation of the sentence containing it, causing that sentence to be interpreted with respect to the discourse. In this section we showed how these analyses also account for “focus sensitive” S-internal modifiers. In these accounts, the semantics of such ”focus particles” are defined in terms of (e.g. dependent on) the alternative set evoked by the presence of prosodic focus on or in the sub-clausal constituent modified by the focus particle. We then investigated the uses of these focus particles as S-modifiers. We first addressed cases where although there was no obviously focused sub-clausal element, the S-modifier was nevertheless interpretable. It appears that in such cases the discourse context supplies a referent for the S-modifier’s presupposed alternative set. We then addressed cases where both a ”focus sensitive” S-modifier and sub-clausal prosodic focus were present in a clause. Based on the existence of these two cases, we suggest that focus particles are not unconditionally dependent for their interpretation on the presence of (sub-clausal) focus. Rather, the focus “sensitivity” of certain modifiers arises due to the semantic similarity between the alternative set they presuppose and the alternative set that prosodic focus presupposes. Due to this similarity, when both elements are present in a clause, their presupposed 13 The difference between these two interpretation of then is discussed in more detail in Chapter 3. 261
    • alternative sets display a strong tendency to resolve to the same contextual or accommodated set. Finally, we investigated the effect of focus on S-modifiers that have not previouly been classified as ”focus sensitive”, e.g. that don’t presuppose alternative sets. We first presented prior analyses of how focus can effect the interpretation of some common discourse and clausal adverbials. We then illustrated some ways that focus on or in other clausal and discourse adverbials can effect their interpreted with respect to the discourse context. 5.4 Implicatures As [Hir91] notes, a variety of “meanings” over and above the literal content of an utterance are conveyed when a speaker utters a sentence. Classes of such meanings have been distinguished in the linguistics literature as shown in (5.73). (5.73) r Entailments: meanings which must also be true when the sentence is true (see [ODA93]) r Presuppositions: meanings entailed by both the sentence and its negation (see [Bea97]) r Implicatures: non-truth-functional meanings(see [Gri89]) r Illocutionary Force: speaker’s act via the utterance (e.g. asserting, promising)(see [Sea69]) r Perlocutionary Effect: effect of the utterance on hearer (e.g. convincing, inspiring)(see [Sea69]) As discussed in Chapter 3, entailment is generally considered part of the truth-functional com- ponent of utterance interpretation. Presupposition is sometimes equated with implicature; the latter is the focus of this section; the distinction will also be discussed in this section. We will return briefly to illocutionary force and perlocutionary effect in Section 5.6. 5.4.1 Gricean Implicature [Gri89] early proposed an influential pragmatic approach to account for the non-truth functional meanings conveyed by an utterance. He distinguishes what is said from what is implicated, as 262
    • shown in Figure 5.1. What is said represents context-independent meanings that determine the truth conditions of an utterance and can be accounted for with a truth-functional semantics. What is implicated represents non-truth-conditional meanings. Conventional implicatures are defined as meanings that are both non-truth-functional and contextindependent. Many researchers view them as identical to pragmatic presuppositions; both will be discussed later in this section. Non-Conventional implicatures, in contrast, are defined as non-truth-functional and contextdependent meanings that arise in a given context due to the speaker’s and hearer’s mutual recognition of rules governing conversation. While such meanings can be linked to non-linguistic context (Non-Linguistic), including aesthetic and cultural knowledge, in this section we focus on those that are linked to language (Conversational). Utterance Interpretation ð ð ð ð ð ñ ð ð ñ ñ ñ ñ ñ ñ What is Said What is Implicated             Conventional Non-Conventional ““ “ “ ”” “ “ ” ” ” ” Non-Linguistic Conversational ‘‘ ‘ ’ ‘ ’ ’ ’ Generalized Particularized Figure 5.1: Gricean Framework [Gri75, 45] argues that a single Cooperative Principle (CP) is known to conversation participants: Make your conversational contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged. [Gri75, 46-47] asserts four maxims which further specify how the CP is observed: Maxim of Quantity: Make your contribution as informative as is required (for the current purposes of the exchange. Do not make your contribution more informative than is required. 263 r
    • r Maxim of Quality: Try to make your contribution one that is true. Do not say what you believe to be false. Do not say that for which you lack adequate evidence. r Maxim of Manner: Be perspicuous. Avoid obscurity of expression. Avoid ambiguity. Be brief (avoid unnecessary prolixity). Be orderly. Maxim of Relevance: Be relevant. r Grice argues that because these rules are standardly followed by conversation participants, purposely exploiting them can convey additional meanings, or conversational implicatures. An oftcited example of exploiting the Maxim of Quantity is shown by the “letter of recommendation” for a student of philosophy, shown in (5.74). (5.74) Dear Sir, This student’s English is grammatical, and his handwriting is legible. Yours, ... As [Hir91] notes, it might seem that the maxim is violated, because a philosophy recommendation is generally expected to contain a significant number of favorable statements relevant to the student’s skills in philosophy. This letter contains only two statements, which, though favorable, are not very relevant to philosophy. Grice argues however that the writer is in fact obeying the CP and its maxims; the writer conveys by this letter that s/he has said as much as s/he truthfully can about his/her student’s skills in philosophy, i.e. that s/he has nothing favorable to say specifically pertaining to these skills. Grice views this as a particularized conversational implicature. Like all conversational implicatures, it is context dependent because changing the context can change or remove it; for example, if the person requesting the letter had stated that the writer was to comment only on the student’s English and handwriting, the implicature would no longer arise. It is particularized because the context that licenses it is particular or special. In contrast, a generalized conversational implicature, such as that shown (5.75 b) which arises when a speaker asserts (5.75 a) (example from [Hor96]), is still context dependent in that changing the context can remove or alter it, but is generally licensed in the absence of particular or marked contexts; a particularized context can remove it, such as a 264
    • game of hide and seek in which the speaker knows the location of his wife but does not want to tell the seeker. In each case, however, it is not the proposition itself that licenses the implicature, but the CP, its maxims, and the context. (5.75 a) My wife is either in the kitchen or in the bathroom. (5.75 b) I don’t know for a fact that my wife is in the kitchen. In general, Grice argues, speakers obey the CP and its maxims, and rely on the fact that the hearer knows this in order to convey conventional implicatures. Of course, a speaker may choose to lie, thereby violating the CP and the maxims by deliberately misleading the hearer. Moreover, a speaker may choose to opt-out of the CP and the maxims, for example, by invoking the Fifth Amendment in a court of law. It is often difficult to decide precisely which maxim a speaker intends to invoke to convey a conversational implicature. In (5.76), for example, all of the maxims can be related to the inference of the implicature. When a speaker utters (5.76 b) in answer to the question in (5.76 a), s/he licenses the conversational implicature in (5.76 c) by the shared assumption between speaker and hearer that the speaker is saying as much as (Quantity) s/he can truthfully (Quality) say that is relevant (Relevance) and she is saying it in a way that is not ambiguous or obscure (Manner). For example, although (5.76 b) is entailed if the speaker has four dollars, saying (5.76 b) in that case violates the maxims of Quantity and Manner. (5.76 a) Do you have any money? (5.76 b) I have three dollars. (5.76 c) I don’t have more than three dollars. There have been a variety of attempts to formalize the CP and its maxims, as well as attempts to categorize conversational implicatures and formalize how they are inferred (see [Hir91] and references therein). Quantity-based implicatures, that is, generalized conversational implicatures which arise due to the maxim of Quantity, have received a lot of attention in the literature (see [Hor96, Gaz79a, Hir91]). [Hir91] calls them scalar implicatures. She argues that the successful conveyance of a scalar implicature relies on the speaker’s and hearer’s mutual perception of the rank- 265
    • ing of the speaker’s utterance with respect to the other utterances s/he might have uttered instead. Building on earlier work, her theory specifies the conditions under which a speaker may license a scalar implicature and that a hearer must have access to in order to interpret this implicature. In particular, she cites a range of research (including her own) which has shown that quantifiers, modals, conjunctions, numerals, definites and indefinites, spatio-temporal orderings, epistemic verbs and verbs of incompletion, set/subset and entity/attribute and generalization/specialization relations as well as a host of other forms in natural language entail or otherwise evoke orderings (linear as well as hierarchical) that a speaker can use to convey scalar implicature. In (5.76) we saw an example of a scalar implicature arising from an ordering induced by the use of a cardinal, the number three. In the context of the question posed in In (5.76a), by answering with (5.76 b) the implicature conveyed is that three is an upper bound on the amount of money the speaker has. The lower values are in fact entailed, e.g. if you have three of something, you also have two. If the question had instead been “Can you afford the magazine?” (where it is mutually known that the magazine in question costs three dollars), the implicature conveyed in (5.76 b) due to the maxim of Quantity is that the speaker has more than three dollars, i.e. that he can afford the magazine without breaking his wallet. While values above three are not entailed by the use of ‡ three, it appears to evoke the ordering; it is generally accepted that mention of a cardinal may be ambiguous between the reading exactly n, at most n, and at least n[Hir91]. In (5.77) we see an example of a scalar implicature of not all arising from the ordering induced by speaker A’s use of the quantifier some, which speaker B then questions, and speaker A must correct (example from [Hir91, 84]). The ordering induced by quantifiers cannot however be defined as logical entailment; in logical terms, x P(x) does not entail xP(x). However, as [Hir91] notes, h e while this means that universally quantified statements such as All X are Y does not entail Some X is Y, it can be assumed that universally quantified statements such as All of the X like Y are logically h A: e ¼t represented as xP(x) x P(x) and so do entail their some X counterparts. Well, some of it you can charge to your grant. (5.77) B: Some? A: Oh, all. 266
    • In the same way, however, modifiers that quantify over persons, places or times, or things induce orderings that a speaker can use to implicate that upper values cannot be asserted truthfully. For example, a speaker’s use of somebody may implicate not everybody, and a speaker’s use of sometimes may implicate not often, not usually, not always. [Hir91] notes that unlike quantifier orderings, modal orderings are entailed in modal systems, where, for example, if a proposition is certain, then it is also possible. In (5.78) we see an example of a scalar implicature of not certain arising from the ordering induced by speaker B’s use of the modal may, which indicates that the proposition is possible (example from [Hir91, 84]). A: You were in the neighbourhood of the pantry at one time, were you not? B: (5.78) I may have been. Hirschberg argues that conjunctions also induce orderings that license scalar implicatures. We saw one example in (5.75), here and in (5.79), [Hir91] argues that or includes an alternative set of propositions. By asserting only one of these propositions, speaker B implies that the alternatives are false (or unknown). This implicature does not follow logically, (i.e. as an exclusive disjunction), because the speaker could cancel the implicature (see below), e.g. by adding and dinner sounds good too. 5.4.2 A: Do you want to go out to dinner or see a movie? B: (5.79) A movie sounds good. Pragmatic and Semantic Presupposition As stated above, [Gri75] defines conventional implicatures as meanings that are both non-truthfunctional and context-independent; he views them as arising by virtue of the meaning of some word or phrase the speaker has used. He distinguished them from conversational implicatures according to their cancelability and detachability. Conversational implicatures are cancelable but non-detachable. For example, the implicature of the letter of recommendation in (5.74) can be canceled by altering the context or appending additional material (e.g. but I don’t mean to suggest that...). However it is non-detachable, in that expressing the literal content of what is said using different lexical items (e.g. penmanship instead of handwriting) does not remove the implicature. 267
    • In contrast, conventional implicatures are non-cancelable and detachable. For example, the conventional implicature that arises from the speaker’s use of but in (5.80 a) is (5.80 b). That it is non-cancelable is shown by the infelicitousness of (5.80 c). That it is detachable is shown in (5.80 d); using and instead of but supplies the same truth-conditional content while detaching the implicature. Moreover, the truth of what is said in (5.80 a) is not dependent on the truth of the implicature in (5.80 b); (5.80 a) is true if and only if Mary is both poor and honest, and false otherwise. (5.80 a) Mary is poor but honest. (5.80 b) There is some contrast between (Mary’s) poverty and (her) honesty. (5.80 c) *Mary is poor but honest, although there’s no connection between (her) poverty and (her) honesty. (5.80 d) Mary is poor and honest. Grice uses the same notions to distinguish what he calls semantic presupposition. In contrast to implicature, presuppositions are neither cancelable nor detachable. For example, the presupposition that arises in (5.81 a) is (5.81 b). That it is non-cancelable is shown by the infelicitousness of (5.81 c). That it is non-detachable is shown in (5.81 d); using ceased instead of stopped supplies the same truth-conditional content and retains the presupposition. Moreover, the truth of what is said in (5.81 a) is dependent on the truth of the presupposition in (5.81 b); (5.81 a) is true if and only if (5.81 b) is true (whether it is truth value is false or unknown if (5.81 b) is false varies depending on the semantic theory. (5.81 a) Michael has stopped beating his wife. (5.81 b) Michael has been beating his wife. (5.81 c) *Michael has stopped beating his wife, although he never beat her in the first place. (5.81 d) Michael has stopped beating his wife. Similarly, [Kar73] distinguish semantic and pragmatic presupposition. In this view, if the semantic presupposition(s) of a sentence is true, then it is true whether that sentence is true or false. A pragmatic presupposition of a sentence, in contrast, must be entailed in contexts where that sentence 268
    • is felicitously uttered. In [KP79], pragmatic presupposition is equated with conventional implicature, and it is noted that many of the linguistic investigations in the literature that have invoked the notion of presupposition are in fact invoking pragmatic presuppositions. The confusion over whether something is a semantic or pragmatic presupposition has been frequently discussed and debated in the literature, either generally, or with respect to specific linguistic expressions that trigger them. Some doubt that semantic presuppositions exist at all, arguing that presuppositions don’t assert requirements on truth-conditions, but rather on the appropriateness of utterances in context (see [Bea97, Hor96, Gaz79b, Sta74, KP79, Str59]). [vdS92], on the other hand, has argued that presuppositional expressions are in fact anaphoric expressions, while [Sim00] argues that the presuppositions associated with change-of-state verbs (e.g. start and stop) are actually conversational implicatures. Some particular expressions to which presupposition has been attributed are shown in(5.82); many of these have been discussed in this thesis. (5.82) r definite,quantified, anaphoric, comparative and factive descriptions (see [HK98, Bea97, KK70]) r focus particles and focus(see [Roo95a]) r discourse connectives (see [Lag98, KKW01b, Ste00b, KP79]) r factive, change-of-state, and judgment verbs (see [KK70, Bea97]) r clefts and WH-questions (see [Pri86]) In semantic theories, presupposition is usually defined as a binary relation between pairs of sentences [Bea97]; one sentence might presuppose another in a semantic theory if the truth of the second is a precondition for the first to be true or false, as discussed above. The representation of the presupposition has been variously handled in semantic theories, depending on the trigger and the theory; partial functions, hidden arguments, assignment functions have all been employed for this purpose, as discussed in Chapter 3. Standard tests for semantic presupposition include embedding 269
    • under negation or modal operators, and discourse context tests, although the latter hold for both types of presupposition, and the tests are not always applicable or useful for a given expression. In pragmatic theories, presupposition is defined in terms of the attitudes and knowledge of language users [Bea97], with or without reference to specific linguistic forms (e.g. the sentence). The representation of presuppositions in pragmatic theories may employ modal operators expressing intention, belief, and mutual belief, as in [Hir91]. In most theories of presupposition, the presupposed meaning is generally assumed to be either known, found or inferable from the context, or accomodatable. 5.4.3 Summary In this section we have introduced the notion of conversational implicature as additional meaning that arises from the speaker’s assumption that discourse is coherent. Conversational implicatures are used by a speaker to convey additional meaning over and above the literal content of what he says. We have distinguished conversational and conventional implicature according to the notions of cancelability and detachability, and discussed how analyses of conventional implicature overlap with semantic and pragmatic theories of presupposition. In the next section, we discuss how a speaker can use S-modifying adverbials to convey implicatures. 5.5 Using S-Modifying Adverbials to Convey Implicatures In this section we briefly review how our analysis in this thesis has already invoked the notion of presupposition which existing semantic (or pragmatic) theories must account for. We then show how a speaker’s use of S-modifying adverbials can convey meanings akin to what Grice has called conversational implicatures, which discourse theory must also account for. 5.5.1 Presupposition In this thesis, we have encountered a variety of uses of the term presupposition as a semantic notion, and a variety of semantic representations for it, depending on the environment in which it was encountered. For example, in our ADVP/PP adverbial data set, we saw that the internal PP argument 270
    • or the ADJ derivative of the ADV can be (or contain elements that are) anaphoric, quantified, a definite description, comparative, and/or contain a hidden argument. We also saw that the adverbials in our data set can interact with focus. While we focused our discussion on how semantic representations of these linguistic forms could be extended to their use within S-modifying adverbials, thereby accounting for the fact that they may depend for their interpretation on the AO interpretation of non-NP constituents in the prior context, we saw above that the dependency of these linguistic forms on their context can be viewed as either a semantic or pragmatic presupposition. 5.5.2 Conversational Implicatures It is not always the case that the dependency of an S-modifying adverbial on discourse context can be accounted for wholly semantically. For example, as noted in the introduction, S-modifying adverbials such as actually or really take only one AO argument semantically: the interpretation of the modified clause. As discussed in Chapter 3, these adverbs supply epistemic features to the S they modify. While focus likely plays a role in their interpretation, conversational implicature can also be involved. A variety of examples of S-modifiers similar in meaning to actually are shown in Table 5.2 along with their corpus counts. Table 5.2: Higher-Ordered Epistemic Adverbials Yielding Implicatures Count 31 105 2 Adverbial actually in fact in reality Count 1 6 16 Adverbial in truth really surely In Section 5.4 we discussed how modals induce modal orderings such that a speaker’s use of a modal can yield a scalar implicature. In (5.78) , for example, the speaker’s use of may was argued to implicate that the truth of the proposition is not certain or not known to him/her. In (5.83), on the other hand, [Hir91] argues that by denying the assertion of a higher value in the modal ordering, the speaker conveys that the lesser can is true or unknown to him/her. In both cases the modal value involved in the implicature is present in the context (i.e. in A’s utterance). 271
    • A: I would like to know if I can take off the back plate. B: (5.83) You shouldn’t have to. As noted in Section 5.4, modal orderings are entailed in modal systems; for example, if a proposition is actual, real, sure, or in the set of facts, then it is also possible, probable, etc. Thus, while a speaker cannot assert something is actual, real, sure, or in the set of facts and at the same time implicate that a lesser modal value of the propositions is not known or certain to him/her, we argue that the speaker can use these higher modal valued S-modifiers when he believes that a lesser modal value of the propositions is not mutually known or certain. In other words, people do not normally assert that a proposition is true, to do so would violate (at least) the maxims of Quantity (do not make your contribution more informative that required) and Manner (be brief). That is, unless there is evidence indicating to the speaker or writer that the hearer or reader may suppose the modal value of the proposition to be false or unknown, or its truth unexpected. Such evidence might come from the context. For example, in (5.84), from our corpus, the article writer cites the quote as from Cervantes. The writer of the reply appears to take this as evidence that the article writer did not know that King Solomon is the original source. The implicature arising by B’s use of actually, that the truth of the modified proposition was not mutually known, is however both cancelable (e.g. if B appends, “but I bet you already knew that”) and non-detachable; B can substitute, for example, in fact, and still achieve the same implicature. (5.84) Your Oct. 2 article on Daniel Yankelovich cited the quote “A good name is better than great riches” as being from Cervantes’ “Don Quixote.” Actually, Cervantes borrowed that quote from a writer of some 25 centuries earlier: Israel’s King Solomon wrote those words in the Book of Proverbs (22:1). (WSJ) These adverbials can also be used when context implies a proposition that is “hard to believe”, as in (5.85), also from our corpus. (5.85) Cathryn Rice could hardly believe her eyes. While giving the Comprehensive Test of Basic Skills to ninth graders at Greenville High School last March 16, she spotted a student looking at crib sheets. She had seen cheating before, but these notes were uncanny. “A stockbroker is an 272
    • example of ...” Virtually word for word, the notes matched questions and answers on the socialstudies section of the test the student was taking. In fact, the student had the answers to almost all of the 40 questions in that section. (WSJ) Furthermore, as in (5.86), these adverbials can be used to explicitly deny the truth value of a proposition that has been asserted by another speaker. A: You are wrong. B: (5.86) Actually/Surely/Really/In fact, I’m not wrong. Of course, [Hir91]’s analysis of (5.78), where the speaker’s use of may can implicate that the the truth of the proposition is not certain or not known to him/her, can be directly extended to S-modifying adverbials that assert a lower value in the modal ordering; these too can be used by a speaker to implicate that a higher value is false or unknown. Some examples of S-modifying adverbials whose internal argument is interpreted as a lower-ordered modality and can be used for this purpose are shown in Table 5.3. Table 5.3: Lower-Ordered Epistemic Adverbials Yielding Implicatures # 4 4 1 1 Adverbial in a sense in a way to a degree to an extent # 1 2 2 5 Adverbial in one sense in one way in certain respects with few exceptions The interpretation of these adverbials as conveying a modality is due to the interpretation of their internal argument, as discussed in Chapter 3. Their interpretation as conveying a lower-ordered modality, however is due to the determiners that modify these internal arguments. [Hir91]’s analysis of (5.77)), where A’s use of the quantifier some can yield a scalar implicature not all, thus also applies to S-modifying adverbials, as does her analysis of quantifiers over people, places times and ‡ things (e.g. sometimes), and her analysis of cardinal yielding scalar implicatures of at most n or at least n or exactly n. Some examples of S-modifying adverbials containing lower-ordered quantifiers over people, times and places that can be used to convey such implicatures are shown in Table 5.4. Interestingly, however, it appears that PP S-modifying adverbials containing the higher-ordered 273
    • quantifiers all or any can also be used by speakers to convey an implicature. As noted in Section 5.4, in contrast to modal and temporal orderings, xP(x) does not entail xP(x). We argue that this h e however can be the implicature caused by the use of the adverbials in Table 5.5. Corpus examples containing adverbials from the first column of Table 5.5 are shown in (5.90)-(5.89). Table 5.4: Lower-Ordered Quantificational Adverbials Yielding Implicatures # 7 6 10 Adverbial at one point at times for a moment # 1 1 1 Adverbial for those few on two occasions to some Table 5.5: Higher-Ordered Quantificational Adverbials Yielding Implicatures # 24 13 6 Adverbial in any case in any event at any rate # 9 49 10 Adverbial above all after all anyway (5.87) Creative accounting is a hallmark of federal credit. Many agencies roll over their debt, paying off delinquent loans by issuing new loans, or converting defaulted loan guarantees into direct loans. In any case, they avoid having to write off the loans. (WSJ) (5.88) His arm had been giving him some trouble and Rector was not enough of a medical expert to determine whether it had healed improperly or whether Hino was simply rebelling against the tedious work in the print shop, using the stiffness in his arm as an excuse. In any event Rector sent him to the local hospital to have it checked... (BROWN) (5.89) Manchester’s unusual interest in telegraphy has often been attributed to the fact that the Rev. J. D. Wickham, headmaster of Burr and Burton Seminary, was a personal friend and correspondent of the inventor, Samuel F. B. Morse. At any rate, Manchester did not lag far behind the first commercial system which was set up in 1844 between Baltimore and Washington. (BROWN) As discussed in Chapter 3, due to their modification by all or any, the abstract objects case, event, rate do not need to be identified with sets of abstract objects in the context. They thus do not 274
    • function semantically as discourse connectives. We argue that a speaker’s use of these adverbials can however implicate that there is some relevant abstract object or set of abstract objects in the context which should be viewed as contained in the sets identified with these internal arguments and which thus should be related to the AO interpretation of the modified proposition via the preposition head of the PP adverbial. Our basis for this argument is simply that, unless there is a relevant abstract object (or set) under consideration, people do not normally assert the relation of a proposition (or other abstract object) to the set of all cases, events, or rates; to do so would violate (at least) the maxim of Quantity (do not make your contribution more informative that required) and Manner (be brief). In the above examples, the context does contains a relevant case, event, rate. Once an implicated (set of) abstract objects in the context is identified, the particular relation supplied by the preposition can be interpreted; of course, in most cases this relation will be a metaphoric interpretation of the preposition, or even idiosyncratic to the adverbial. However, these implicatures are cancelable, depending on context, but non-detachable. For example, a speaker could say: “At any rate, although I’m not considering anything in particular that we’ve discussed so far”, and a speaker can use in any event and at any rate interchangeably. We argue that above all creates the same implicature; by using an adverbial that relates the modified proposition to some unspecified set of all, the speaker implicates that some relevant set of abstract objects is to be found in the context. An example where this is the case is shown in (5.90) (the relevant set includes all the AOs that the president did or did not want). Of course, above all does not assert that the modified proposition is physically above all other relevant propositions; rather it asserts that the modified proposition is the most important (above in status) proposition among all relevant others. (5.90) The President had set for himself the task, which he believed vital, of awakening the U.S. and its allies to the hard and complex effort necessary to shift that balance. He did not want the effort weakened by any illusion that summit magic might make it unnecessary. He wanted time, too, to review the United States’ global commitments and to test both the policies he had inherited and new ones he was formulating. Above all, he did not want to appear to be running hat in hand to Premier Khrushchev’s doorstep .(WSJ) 275
    • While we are not attempting to completely account for the idiosyncratic meaning of these adverbials, we do believe these implicatures can be employed and can account for one part of their interpretation. Note however that it is sometimes difficult to understand how the assumption of discourse coherence yields the context dependency of a discourse adverbial. For example, while it may be that the same implicatures arise when a speaker uses the adverbials after all and anyway, the relation of their linguistic form to their meaning is much more abstract. Consider the examples from our corpus in (5.91)- (5.92). (5.91) The stock market’s dizzying gyrations during the past few days have made a lot of individual investors wish they could buy some sort of insurance. After all, they won’t soon forget the stock bargains that became available after the October 1987 crash. (WSJ) (5.92) There are many, many things to do. Find out what you like to do most and really give it a whirl. If you can’t think of a thing to do, try something – anything. Maybe you will surprise yourself. True! We are not all great artists. I, frankly, can’t draw a straight line. Maybe you are not that gifted either, but how about puttering around with the old paints? You may amaze yourself and acquire a real knack for it. Anyway, I’ll bet you have a lot of fun.(BROWN) As these examples indicate, after all doesn’t assert that the modified proposition comes temporally or textually after all that preceded it; rather, it appears that in many cases it can be preceded by because, and it appears to indicate that the modified proposition is among the most significant, and already known, possible causes. It may be that after all should be analyzed as taking as its argument, or implicating, a set of causes arising from an inferred or explicit causal relation between the modified clause and the prior discourse. [WJSK03] propose a similar analysis for for example, whose interpretation appears to be parasitic on a previous explicit or inferred structural relation, but while for for example this relation may be one of cause, result, or elaboration, or supplied by the idiosyncratic meaning of a verb, after all seems to require a cause. Anyway appears to indicates a return to a discussion that got off on a tangent. It may be that anyway should be analyzed as taking as its argument, or implicating, one or more elaborations arising from an inferred or explicit elaboration relation between the modified clause and the prior discourse. 276
    • This discussion is exploratory; while it is clear that the modified clauses of these adverbials are being related to the prior discourse, either through focus, implicature, idiosyncratic lexical semantics, or some combination thereof; in order to determine whether or not it is useful to consider their linguistic form and its interaction with implicatures as contributing to their interpretation, an annotation project such as that described in Chapter 4 is required. 5.5.3 Interaction of Focus and Implicature Thus, we can invoke the notion of conversational implicature to explain why S-modifiers have been treated as discourse connectives even though their linguistic form does not cause them to refer to an abstract object in the prior discourse: by using them the speaker has created an implicature. In many of the cases where implicatures arise, there also appears to be focus. It was noted in Section 5.2 that focus can effect the interpretations of conversational implicatures. [Hir91] also discusses how intonation can effect the disambiguation of various implicatures; for our purposes, we note that the two mechanisms of discourse coherence interact; it may be that to induce an awareness of the ordering invoked by linguistic forms, they must be focused. This requires further study, however. 5.5.4 Summary In this section we have addressed how S-modifying adverbials can be used by a speaker to create conversational implicatures, causing the sentence containing them to be interpreted with respect to the discourse. We have shown that [Hir91]’s analysis of how scalar implicatures can arise through the use of lexical items that induce orderings can be directly applied to the use of these forms in or as S-modifying adverbials. We have further postulated two additional types of implicatures that can arise when the speaker’s obeyance of the CP and its maxims interacts with the requirement that discourse be coherent. 277
    • 5.6 5.6.1 Other Contributions Discourse Structure Many researchers have studied the pragmatic effects of discourse connectives on discourse structure and interpretation. As discussed in Chapter 2, [GS86, Sch87] view many adverbials, and intonation, as cues of discourse structure. For example, anyway might signal a return (pop) to a preceding segment, while now might signal the embedding (push) of a segment. As discussed in Chapter 3, viewing the interpretations that a discourse unit makes available as abstract objects may enable at least some of these “pragmatic” uses to be accounted for in the semantics of the DLTAG model. 5.6.2 Performatives As noted in Section 5.4, there are other meanings associated with utterances that have not been discussed in this thesis; in particular, their illocutionary force and a perlocutionary effect. [Sea69] associates these additional meanings conveyed by an utterance with, among other things, intonation and “performative verbs”, e.g. I promise you..., I baptize you..., etc. It may be that that the use of adverbials can have a similar force and effect, which should eventually be incorporated into the DLTAG model. 5.7 Conclusion In this chapter we have explored two ways apart from their predicate argument structure and interpretation that adverbials can be used to contribute to discourse coherence. Our intent was to demonstrate how prosodic focus and implicature can cause an adverbial which is not normally dependent on the discourse for its interpretation to be interpreted with respect to the discourse. While our analysis was preliminary, it makes the important point that adverbial semantics is not the only factor influencing the interpretation of adverbials, and should not be considered in isolation as the only mechanism causing adverbials to create discourse coherence. It is our expectation that all of these factors will eventually have to be incorporated into a comprehensive model of discourse. 278
    • Chapter 6 Conclusion 6.1 Summary The underlying theme of this thesis is that discourse is not a completely separate category from syntax and semantics and that discourse-level coherence can arise from the same substrate as clauselevel coherence. We have overviewed similarities and differences between a variety of theories of discourse coherence, which, taken together, distinguish different modules required to build a complete inter-     pretation of discourse. We discussed DLTAG ([FMP 01, CFM 02, WJSK03, WKJ99, WJSK99, WJ98]) as a theory that bridges the gap between clause and discourse modules, by using the same syntactic and semantic mechanisms that build the clause interpretation to build an intermediate level of discourse interpretation on top of the clause interpretation. In DLTAG, cue phrases, or discourse connectives, are predicates, like verbs, except they can take interpretations of clauses as arguments. For coordinating and subordinating conjunctions, both arguments come structurally. For adverbial cue phrases, which are mainly adverb (ADVP) and prepositional (PP) phrases, only one argument comes structurally. Based on consideration of computational economy and behavioral evidence, DLTAG argues that the other argument of these adverbials must be resolved anaphorically. However, while DLTAG proposes that certain adverbials function as discourse connectives, it does not isolate this subset from the set of all adverbials. 279
    • Because the set of adverbials is compositional, and therefore infinite ([Kno96]), it is not possible to list all of the adverbials that function as discourse connectives. We have therefore presented a corpus-based investigation of how semantics and pragmatics cause certain adverbials to be classified as discourse connectives, while other connectives are not. Our investigation has shown that in many cases discourse connectives are not an accidental grouping of ADVP and PP adverbials; rather, their discourse properties arise naturally from their semantics. We have distinguished discourse adverbials and clausal adverbials semantically in terms of their predicate argument structure and interpretation. We have argued that whether or not ADVP and PP adverbials found in a corpus are classified as discourse adverbials depends on the interpretation of their semantic arguments. We have shown that discourse adverbials are very similar to discourse deixis, in that both require for their interpretation an abstract object made available in the prior discourse or spatio-temporal context. Our semantic analysis is presented independently of any particular semantic formalism, partly because how information is represented in a model depends, at least to some extent, on what computational problems the resulting model will be used to solve, and whether the expense of a particular representation produces worthwhile results. Nevertheless our results must be taken into account when building a discourse model, for in their entirety the number of discourse adverbials far overshadow the few discourse connectives that have been so far addressed in the literature. We have thus overviewed a variety of semantic formalisms and a variety of clause and discourse level syntax-semantic interfaces. Drawing on this research, we presented one way in which the predicate-argument structure and interpretation of discourse adverbials can be incorporated into a syntax-semantic interface for DLTAG. We also discussed the DLTAG annotation project, whose goal is to annotate the arguments of all discourse connectives in the Penn Treebank corpus. It is not only due to their argument structure that adverbials appear to require discourse for their interpretation, however. We have encountered a number of adverbials that have been treated as discourse connectives despite the fact that their discourse semantics alone does not cause them to be interpreted with respect to abstract object interpretations in the discourse or spatio-temporal context. We thus have explored other explanations for why such adverbials can require discourse context for 280
    • their interpretation; in particular, those that involve the interaction of their semantics with other aspects of discourse coherence. We discussed prosody as a semantic mechanism of discourse coherence and showed how focus effects in both clausal and discourse adverbials contribute discourse coherence. We discussed Gricean implicature as an additional aspect of meaning that arises from the assumption of discourse coherence and showed how clausal and discourse (S-modifying) adverbials can be used to convey implicatures. In summary, we’ve shown that the discourse semantics of adverbials can go a long way towards building complete model of discourse interpretation, but other aspects of discourse coherence must also be taken into account, and corpus annotation and analysis is also required, to allow us to better understand the empirical realization of discourse syntax and semantics and the correspondences between discourse connectives and discourse relations, such as when and which discourse connectives are used, versus when and which discourse relations are inferred. 6.2 Future Directions We end by identifying a number of other issues for future study that are suggested by the investigations in this thesis. Each of these constitutes a broader use of the syntactic, semantic and pragmatic functions of adverbials. The first line of research concerns the representation, derivation, and resolution of abstract object interpretations. Although as stated above, how information is represented in a model depends, at least in part, on what computational problems the resulting model will be used to solve, because DLTAG is built on top of a clause-level module, ideally the semantics employed for both discourse deictic reference to abstract objects and adverbial modification of abstract objects would be similar. However, the issue of how to represent and derive abstract object interpretations is still an open question. For example, as noted in Chapter 2, the fact that abstract object interpretations are not grammaticalized as nouns prior to discourse deixis reference, and the fact that there appear to be structural restrictions on discourse deictic reference to them has lead some researchers to argue that abstract objects are not present as entities in the discourse model prior to discourse deixis reference. According to these researchers, their entity reading is added to the discourse model via discourse 281
    • deixis reference. [DH95], for example, use type coercion along with other computational operations to access AO interpretations, while [Web91] uses referring functions. In contrast, [Sto94] and [Ash93] argue that AO interpretations are already present in the discourse model before discourse deixis reference to them. [Sto94] uses a possible worlds semantics in which discourse deixis refers to information states, while[Ash93] represents events as hidden arguments to verbs and facts and propositions as pieces of semantic structure, both of which can be the referents of discourse F • j —–B@ deixis. We saw in Chapter 3 that [Ern84, Moo93, Ver97, KP79] all make use of the basic abstract objects in exploring the semantic interpretation of adverbial modification, although only [Moo93] formalizes the representation of these entities, treating both events and facts as entities already present in the domain of individuals, and using predicate logic to represents both as hidden argument variables of their associated predicates. What all of these researchers agree on is that AO interpretations must be derivable from non-NP constituents. However, this work has only considered the question in the context of discourse deixis or a few adverbials, and moreover only considered a relatively small number of AO interpretations. In Chapter 3 we introduced additional complexity into the analysis, namely that the possible objects modified by adverbials corresponds to a much wider range of objects than previously considered. We nevertheless believe that by extending the range of objects to be considered, the current study may lead to a more comprehensive solution. It may, for example, be the case that all AOs can be subsumed within [Ash93]’s existing classification; in other words, it may be feasible to treat all AOs as sub-types of events, fact-like objects, or proposition-like objects, as [Ash93, Ven67] begin to do. If this is the case, we could treat hidden arguments as “place-holders” for abstract objects, and allow them to be coerced to specific denotations, such as consequences and reasons that are dependent on and determinable from the predication on them. The problem of abstract object anaphora resolution has also been far more studied for discourse deixis than for discourse adverbials. Fully understanding both the mechanisms that determine which elements are accessible to function as antecedents and how AO interpretations are derived will however require the production and analysis of an annotated corpus such as that described in Chapter 4, and should, moreover, consider the analyses already proposed for discourse deixis. 282
    • Another open question concerns the incorporation of abstract objects into Centering Theory ([WJP81]). As discussed in Chapter 2, Centering Theory models discourse processing factors that explain differences in the perceived coherence of discourses. The central idea is that each utterance presents a centers, e.g. a topic entity; the most coherent discourses are those in which, across a series of utterances, the center remains constant, and can be referred to by a pronoun. To date, however, reference to abstract object entities in CT has only been considered in terms of discourse deixis reference (see [Eck98]). The effect of adverbial reference to abstract object entities on discourse coherence has not been considered. It would be very interesting to study if and how CT can “scaleup” to the discourse level and incorporate non-reifed clausal interpretations and relations between them supplied by discourse connectives and inference. Another line of research concerns the extension of the corpus-based investigations performed in this thesis to adverbials in other languages. For example, in English we were often able to illustrate semantic arguments by making them overt (e.g. in addition to this). According to a German speaker1 , in German, similar semantic arguments are not optional; they are either explicit or lexically incorporated into the adverbial. For example, as shown in (6.1) the German equivalent of as a result is infelicitous, but the equivalent of as a result of that is fine, and even better is the case where the preposition, demonstrative, and indefinite noun are incorporated into a single lexical item. bad “as a result” fine “als Folge dessen” “as a result of that” better (6.1) “als Folge” “demzufolge” “dem”=“that” “zu”=“as” While we find a discussion in [Ale97] of some cross-linguistic properties of adverbials, including similarities in scope, position and meaning, it would be well worthwhile to study the semantic equivalents across languages of the discourse adverbials found in our data set, to see if the same semantic mechanisms are employed to cause them to function as discourse connectives. In this way we would also further enlarge the data set, enabling the development of a widely applicable adverbial semantics. 1 SIGDial reviewer, personal communication 283
    • Another important line of research concerns the practical application of the investigations in this thesis and of DLTAG in general. In Chapter 3, for example, we discussed one natural language generation system (SPUD, [SD97]). It would be interesting to see if incorporating discourse syntax and semantics into such systems improves their efficiency and results. The corpus annotation of discourse syntax and semantics, moreover, should lead to anaphora resolution algorithms for the anaphoric arguments of discourse adverbials, and may also help improve question-answering and other information-retrieval systems, which need to know what to look for, preferably with a minimum of human interaction. For example, if such a system runs across as a result in some text, then it should know immediately to look for a cause, even if this object is not explicitly stated in the same sentence or prior sentences. Finally, a significant task remaining is to investigate how the DLTAG structures and interpretations built from the syntax and semantics of discourse connectives can be incorporated into highlevel modules of both discourse and dialogue. While DLTAG simplifies the construction of an intermediate level of discourse, the relationship between how DLTAG discourse trees combine to make entire discourses and dialogues, the constraints, if any, these structures place on anaphora resolution, and how inference and other aspects of discourse coherence, including focus and Gricean implicature are to be incorporated, all remain very interesting subjects for future study. 284
    • Bibliography [AC74] D. Allen and A. Cruttenden. English sentence adverbials: Their syntax and their intonation in British English. Lingua, 34:1–30, 1974. [Ale97] A. Alexiadou. Adverb Placement: A Case Study in Antisymmetric Syntax. John Benjamins Publishing Company, 1997. [Ash93] N. Asher. Reference to Abstract Objects. Kluwer, Dordrecht, 1993. [Aus61] J. Austin. Philosophical Papers. 1961. Reprinted in J. L. Austin, Philosophical Papers, ed. by J. O. Urmson and Geoffrey J. Warnock (Oxford, 1990). [Bea97] D. Beaver. Presupposition. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language, pages 939–1007. Elsevier Science B.V., 1997. [Bie01] G. Bierner. Alternative Phrases: Theoretical Analysis and Practical Application. PhD dissertation, University of Edinburgh, 2001. [Bik00] D. Bikel. A statistical model for parsing and word-sense disambiguation, 2000. [Bla87] Diane Blakemore. Semantic Constraints on Relevance. Blackwell, Oxford, 1987. [Bos95] J. Bos. Predicate logic unplugged. In P. Dekker and M. Stokhof, editors, Proceedings of the 10th Amsterdam Colloguium, pages 133–142. 1995. [BP83] J. Barwise and J. Perry. Situations and Attitudes. The MIT Press, Cambridge, MA, 1983. 285
    • [BWFP87] Susan Brennan, Marilyn Walker-Friedman, and Carl Pollard. A Centering approach to pronouns. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, pages 155–162, Stanford, CA, 1987. [Byr00] D. Byron. Semantically enhanced pronouns. Proceedings of the Discourse Anaphora and Reference Resolution Conference (DAARC2000), 12, 2000.   [CFM 02] C. Creswell, K. Forbes, E. Miltsakaki, R. Prasad, B. Webber, and A. Joshi. The discourse anaphoric properties of connectives. Proceedings of DAARC, 2002.   [CFM 03] C. Creswell, K. Forbes, E. Miltsakaki, Jason Teeple, B. Webber, and A. Joshi. Anaphoric arguments of discourse connectives: Semantic properties of antecedents versus non-antecedents. Proceedings of EACL, 2003. [CFS97] A. Copestake, D. Flickinger, and I. Sag. Minimal Recursion Semantics. An Introduction. Manuscript, Stanford University, 1997. [Cho71] N. Chomsky. Deep structure, surface structure, and semantic representation. In D. Steinberg and L. Jakobovitz, editors, Semantics, pages 193–217. Cambridge University Press, Cambridge, 1971. [Cho76] N. Chomsky. Conditions on rules of grammar. Linguistic Analysis, pages 303–351, 1976. [CL93] N. Chomsky and H. Lasnik. The theory of principles and parameters. In J. Jacobs, A. von Stechow, W. Sternefeld, and T. Venneman, editors, Syntax: An International Handook of Contemporary Research, pages 506–569. Mouton de Gruyter, Berlin, 1993. [Coh84] R. Cohen. A computational theory of the function of clue words in argument understanding. Proceedings of the 10th International Conference on Computational Linguistics, pages 251–258, 1984. 286
    • [CQ52] A. Church and W. Quine. Some theorems on definability and decidability. Journal of Symbolic Logic, 17:179–187, 1952. [CW00] Stephen Clark and David Weir. A class-based probabilistic approach to structural disambiguation, 2000. [Dav67] D. Davidson. The logical form of action sentences. In N. Rescher, editor, The Logic of Decision and Action. University of Pittsburgh Press, Pittsburgh, PA, 1967. Reprinted in D. Davidson, Essays on Actions and Events, Oxford: Oxford University Press, 1982, pp. 105-122. [DH95] O. Dahl and C. Hellman. What happens when we use an anaphor. In Presentation at the XVth Scandinavian Conference of Linguistics, Oslo, Norway, 1995. [DiE89] B. DiEugenio. Clausal reference in Italian. In Proceedings of the Pennsylvania Linguistics Colloquium, Philadelphia, PA, 1989. [Dow79] D. Dowty. Word Meaning and Montague Grammar. D. Reidel, Dordrecht, 1979. [Eck98] M. Eckert. Discourse Deixis and Null Anaphora in German. PhD dissertation, University of Edinburgh, 1998. [EM90] Michael Elhadad and Kathleen McKeown. Generating connectives. In Proceedings of COLING. Helsinki, Finland, 1990. [Ern84] Thomas Ernst. Toward an Integrated Theory of Adverb Position in English. PhD dissertation, Indiana University, 1984. [ES99] M. Eckert and M. Strube. Resolving discourse deictic anaphora in dialogues. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, 1999. [Fir64] Jan Firbas. On defining the theme in functional sentence analysis. Travaux Linguistiques de Prague, 1:229–236, 1964. 287
    • [FM02] K. Forbes and E. Miltsakaki. Empirical studies of centering shifts and cue phrases as embedded segment boundary markers. University of Pennsylvania Working Papers in Linguistics, 7(2), 2002.   [FMP 01] K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, B. Webber, and A. Joshi. D-LTAG system: Discourse parsing with a lexicalized tree-adjoining grammar. In ESSLLI’ 2001 Workshop on Information Structure, Discourse Structure and Discourse Semantics. Helsinki, Finland, 2001. [Fox87] Barbara Fox. Discourse Structure and Anaphora: written and conversational English. Cambride University Press, Cambridge, England, 1987. [Fra88] Bruce Fraser. Types of English discourse markers. Acta Linguistica Hungaria, 38(1):19–33, 1988. [FvG01] Anette Frank and Josef van Genabith. Gluetag: Linear logic based semantics for LTAG-and what it teaches us about LFG and LTAG. In Miriam Butt and Tracy Holloway King, editors, Proceedings of the LFG01 Conference. CSLI Publications, 2001. [FW02] K. Forbes and B. Webber. A semantic account of adverbials as discourse connectives. Philadelphia, PA, 2002. [Gar97a] C. Gardent. Interpreting focus. Presentation Slides for ESSLI 1997 Conference, http://www.coli.uni-sb.de/ claire/teaching/essll97/essli97-lect3.ps, 1997. F [Gar97b] Claire Gardent. Discourse TAG. Claus report nr.89, University of the Saarland, Saarbrucken, 1997. [Gaw86] J. Gawron. Situations and prepositions. Linguistics and Philosophy, 9(3):327–382, 1986. [Gaz79a] G. Gazdar. Pragmatics: Implicature, Presupposition and Logical Form. Academic Press, New York, 1979. 288
    • [Gaz79b] G. Gazdar. A solution to the projection problem. In Oh and Dineen, editors, Syntax and Semantics 11: Presupposition, pages 57–89. Academic Press, New York, 1979. [Gaz99] Gerald Gazdar. http://www.cogs.susx.ac.uk/lab/nlp/gazdar/teach/nlp/nlp.html, 1999. [GC00] S. Gustafson-Capova. The influence of prosodic prominence on the interpretation of ambiguous anaphors in Swedish. Proceedings of the Discourse Anaphora and Reference Resolution Conference (DAARC2000), 12, 2000. [GHZ93] J. Gundel, N. Hedberg, and R. Zacharski. Cognitive status and the form of referring expressions in discourse. Language, 69, 1993. [gra] http://www.edufind.com/english/grammar/DETERMINERS1.cfm. [Gre69] S. Greenbaum. Studies in English Adverbials Usage. Longmans, London, 1969. [Gri75] H. P. Grice. Logic and conversation. In P. Cole and J. Morgan, editors, Syntax and Semantics, vol. 3, pages 41–58. Academic Press, 1975. [Gri89] H. P. Grice. Studies in the Way of Words. Cambridge, MA, 1989. Harvard University Press, A summary of Grice’s work can be found at: http://www.artsci.wustl.edu/ philos/MindDict/grice.html. F [Gro99] The XTAG Research Group. A Lexicalized Tree Adjoining Grammar for English. http://www.cis.upenn.edu/ xtag, 1999. F [GS86] B. Grosz and C. Sidner. Attention, intention and the structure of discourse. Journal of Computational Linguistics, 12:175–204, 1986. [Hal67] M. Halliday. Notes on transitivity and theme in English. Journal of Linguistics, 3, 1967. [Har99] D. Hardt. Dynamic interpretation of verb phrase ellipsis. Linguistics and Philisophy, 22:187–221, 1999. 289
    • [Hei82] Irene Heimt. Semantics of Definite and Indefinite Noun Phrases. PhD dissertation, University of Massachusetts, 1982. [HH76] M. Halliday and R. Hasan. Cohesion in English. Longman, London, 1976. [Hir91] Julia Hirschberg. A Theory of Scalar Implicature. Garland Publishing Company, New York, 1991. Published version of Ph.D. Dissertation, University of Pennsylvania, 1985. [HK98] Irene Heim and Angela Kratzer. Semantics in Generative Grammar. Blackwell, 1998. [Hob85] Jerry Hobbs. Ontological promiscuity. In Proceedings of the 23 Annual Meeting of & 1` the Association for Computational Linguistics, pages 61–69. Palo Alto, CA, 1985. [Hob90] Jerry Hobbs. Literature and cognition. CSLI Lecture Notes, 21, 1990. [Hor69] L. Horn. A presuppositional theory of ‘only’ and ‘even’. CLS 5, Chicago Linguistics Society, 1969. [Hor96] L. Horn. Presupposition and implicature. In S. Lappin, editor, The Handbook of Contemporary Semantic Theory, pages 299–319. Blackwell, Oxford, 1996. [Hov90] Eduard Hovy. Parsimonious and prolifigate approaches to the question of discourse structure relations. In Proceedings of the Fifth International Workshop on Natural Language Generation, pages 128–136. 1990. [Hov93] Eduard Hovy. Automated discourse generation using discourse structure relations. Artificial Intelligence, 63:69–142, 1993. [Hov95] Eduard Hovy. The multifunctionality of discourse markers. In Proceedings of the Workshop on Discourse Markers. Holland, January, 1995. [HSAM93] Jerry Hobbs, Mark Stickel, Douglas Appelt, and Paul Martin. Interpretation as abduction. Artificial Intelligence, pages 69–142, 1993. [Hum48] David Hume. An Inquiry Concerning Human Understanding. The Liberal Arts Press, New York, 1955 edition, 1748. 290
    • [Jac72] R. Jackendoff. Semantic Interpretation in Generative Grammar. MIT Press, Cambridge, MA, 1972. [Jac90] R. Jackendoff. Semantic structures. In Current Studies in Linguistics Series. Cambridge, MA, 1990. [JKR03] Aravind Joshi, Laura Kallmeyer, and Maribel Romero. Flexible composition in LTAG: Quantifier scope and inverse linking. In Proceedings of the International Workshop on Compositional Semantics. Tilburg, The Netherlands, 2003. Michael Johnston. ‡F † Yxd [Joh95] -clauses, adverbs of quantification, and focus. In Proceedings of WCCFL 13, Stanford Linguistics Association, CLSI, Stanford University, 1995. [Jos87] Aravind Joshi. An introducation to tree adjoining grammar. In Alexis Manaster-Ramer, editor, Mathematics of Language, pages 87–114. John Benjamins, Amsterdam, 1987. [JR98] J. Jayez and R. Rossari. Discourse relations versus discourse marker relations. In Proceedings of the ACL Workshop on Discourse Relations and Discourse Markers, pages 72–78. Montreal, Canada, 1998. [JVS99] A. Joshi and K. Vijay-Shanker. Compositional semantics with lexicalized tree- adjoining grammar (LTAG): How much underspecification is necessary? In Alexis Manaster-Ramer, editor, Proceedings of the Third International Workshop on Computational Semantics. Tilburg, Netherlands, January, 1999. [Kal02] Laura Kallmeyer. Using an enriched TAG derivation structure as basis for semantics. In Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6), pages 101–110. University of Venice, 2002. [Kam79] H. Kamp. Events, instants, and temporal reference. In R. Baeuerle et al., editor, Semantics from Different Points of View. Springer Verlag, Berlin, 1979. [Kar73] L. Karttunen. Presuppositions of compound sentences. Linguistic Inquiry, 4(2):169– 193, 1973. 291
    • [Kas93] R. Kasper. Adjuncts in the mittelfeld. In J. Nerbonne, K. Netter, and C. Pollard, editors, German in Head-Driven Phrase Structure Grammar, Lecture Note Series, Chapter 4, pages 39–69. Utrecht, The Netherlands, 1993. [Keh95] Andrew Kehler. Interpreting Cohesive Forms in the Context of Discourse Inference. PhD dissertation, Harvard University, 1995. [KJ99] Laura Kallmeyer and Aravind Joshi. Factoring predicate argument and scope semantics: Underspecified semantics with LTAG. In Paul Dekker, editor, Proceedings of the 12th Amsterdam Colloquium, pages 169–174. Amsterdam, December, 1999. To appear in the Journal of Language and Computation, 2002. [KK70] P. Kiparsky and C. Kiparsky. Fact. In M. Bierwisch and K. Heidolph, editors, Progress in Linguistics, pages 143–13. Mouton, The Hague, 1970. Also in (Petofi and Franck, 1973). [KKR91] R. Kittredge, T. Korelsky, and O. Rambow. On the need for domain communication knowledge. Computational Intelligence, 7:305–314, 1991. [KKW01a] I. Kruijff-Korbayova and B. Webber. Concession, implicature, and alternative sets. In Proceedings of the International Workshop on Computational Semantics (IWCS-4), Tilburg, January, 2001. [KKW01b] I. Kruijff-Korbayova and B. Webber. Information structure and the interpretation of ‘otherwise’. In ESSLLI 2001 Worshop on Information Structure, Discourse Structure, and Discourse Semantics, pages 61–78, Helsinki, Finland, 2001. [Kno96] Ali Knott. A Data-Driven Methodology For Motivating A Set of Coherence Relations. PhD dissertation, University of Edinburgh, 1996. [Kon91] Ekkehard Konig. The Meaning of Focus Particles: A Comparative Perspective. Routledge, London, 1991. 292
    • [KOOM01] A. Knott, J. Oberlander, M. O’Donnell, and C. Mellish. Beyond Elaboration: The interaction of relations and focus in coherent text. In J. Schilperoord T. Sanders and W. Spooren, editors, Text representation: linguistic and psycholinguistic aspects, pages 181–196. Benjamins, 2001. [KP79] L. Karttunen and S. Peters. Conventional implicature. In C. Oh and D. Dinneen, editors, Syntax and Semantics: Presupposition, volume 11, pages 1–56. Academic Press, New York, 1979. [KP02] Paul Kingsbury and Martha Palmer. From Treebank to Propbank. In Proceedings of the Third International Conference on Language Resources and Evaluation, LREC-02. Las Palmas, Canary Islands, Spain, May 28-June 3, 2002. [KR93] H. Kamp and U. Reyle. From Discourse to Logic. Kluwer Academic Publishers, 1993. [Kra89] A. Kratzer. An investigation of the lumps of thought. Linguistics and Philosophy, 12, 1989. [Kri] M. Krifa. Focus. http://cognet.mit.edu/MITECS/Entry/krifka2. [Kri92] M. Krifka. A compositional semantics for multiple focus constructions. Cornell Working Papers in Linguistics, 10:127–158, 1992. [LA90] D. Litman and J. Allen. Discourse processing and commonsense plans. In P. Cohen, J. Morgan, and M. Pollack, editors, Intentions in Communication. MIT Press, 1990. [LA93] A. Lascarides and N. Asher. Temporal interpretation, discourse relations and commonsense entailment. Linguistics and Philosophy, 16:437–493, 1993. [LA99] A. Lascarides and N. Asher. Cognitive states, discourse structure, and the content of dialogue. In Proceedings to Amstelogue. 1999. [Lad66] R. Ladd. Intonational Phonology. Cambridge University Press, Cambridge, MA, 1966. [Lag98] Luuk Lagerwerf. Causal Connectives Have Presuppositions. Graphics, The Hague, 1998. 293 Holland Academic
    • [Lak74] R. Lakoff. Remarks on this and that. In Papers from the Tenth Regional Meeting of the Chicago Linguistic Society. 1974. [Lit98] K. Litkowski. Analysis of subordinating conjunctions. CL Research Technical Report (Draft) 98-01, CL Research, Gaithersburg, MD, 1998. http://www.clres.com/onlinepapers/sc.html. [LO93] A. Lascarides and J. Oberlander. Temporal connectives in a discourse context. In Proceedings of the 6th International Conference of the European Chapter of the Association for Computational Linguistics, pages 260–268. Utrecht, The Netherlands, 1993. [Lon83] Robert Longacre. The Grammar of Discourse. Plenum Press, New York, 1983. [Lyo77] J. Lyons. Semantics. Cambridge University Press, Cambridge, MA, 1977. [Mar92] J.R. Martin. English Text: System and Structure. Benjamin, Amsterdam, 1992. [Mar97] Daniel Marcu. Instructions for manually annotating the discourse structure of texts. http://www.isi.edu/ marcu/software.html, 1997. F [Mar99] Daniel Marcu. A formal and computational synthesis of Grosz and Sidner’s and Mann and Thompson’s theories. In Workshop on Levels of Representation in Discourse. Edinburgh, Scotland, 1999. [Mar00] Daniel Marcu. The rhetorical parsing of unrestricted texts: A surface-based approach. Computational Linguistics, 25(3):395–448, 2000. [McC88] J.D. McCawley. The Syntactic Phenomena of English. The University of Chicago Press, 1988. [McL01] Mark McLauchlan. Maximum entropy models and prepositional phrase ambiguity, 2001. [MG82] Sally McConell-Ginet. Adverbs and logical form. Language, 58:144–184, 1982. 294
    • [Mit82] Anita Mittwoch. On the difference between eating and eating something: Activities versus accomplishments. Linguistic Inquiry, 13, 1982. [Mod01] N. Modjeska. Towards a resolution of comparative anaphora: A corpus study of ‘other’. In PAPACOL. Italy, 2001. [Mon74] R. Montague. On the nature of certain philosophical entities. In R. Thomason, editor, Formal Philosophy, pages 148–187. Yale University Press, New Haven, CT, 1974. [Moo93] R. Moore. Events, situations and adverbs. In R. Weischedel and M. Bates, editors, Challenges in Natural Language Processing. Cambridge University Press, Cambridge, 1993. [MP92] Johanna Moore and Martha Pollack. A problem for RST: The need for multi-level discourse analysis. Computational Linguistics, 18(4):537–544, 1992. [MP93] Johanna Moore and Cecile Paris. Planning text for advisory dialogues: Capturing intentional and rhetorical information. Computational Linguistics, 19(4):651–694, 1993. [MS88] Marc Moens and Mark Steedman. Temporal ontology and temporal reference. Journal of Computational Linguistics, 14(2):15–28, 1988. [MT88] William Mann and Sandra Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. [MTC95] Gail Mauner, Michael Tanenhaus, and Greg Carlson. Implicit arguments in sentence processing. Journal of Memory and Language, 34:357–382, 1995. [MW] Online Miriam-Webster Dictionary, http://www.m-w.com/home.htm. [Nun79] G. Nunberg. The non-uniqueness of semantic solutions: Polysemy. Linguistics and Philosophy, 3, 1979. [ODA93] W. O’Grady, M. Dobrovolsky, and M. Aronoff. Linguistics: An Introduction. St. Martin’s Press, New York, 1993. 295
    • [Par84] Barbara Partee. Nominal and temporal anaphora. Linguistics and Philosophy, 7:243– 286, 1984. [Pas91] R. Passonneau. Some facts about centers, indexicals, and demonstratives. Proceedings of the 29th Annual Meeting of the ACL, 1991. [Per90] Fernando Pereira. Categorial semantics and scoping. Computational Linguistics, 16(1):1–10, 1990. [PH90] J. Pierrehumbert and J. Hirschberg. The meaning of intonation contours in the interpretation of discourse. In P. Cohen, J. Morgan, and M. Pollack, editors, Intentions in Communication, pages 271–312. MIT Press, Cambridge, MA, 1990. [Pol96] Livia Polanyi. The Linguistic Structure of Discourse. CSLI Technical Report, Stanford CA, 1996. [Pri81] E. Prince. Toward a taxonomy of given/new information. In P. Cole, editor, Radical pragmatics. Academic Press, NY, 1981. [Pri86] E. Prince. On the syntactic marking of presupposed open propositions. In A. Farley, P. Farley, and K. McCullough, editors, Papers from the Parasession on Pragmatics and Grammatical Theory, 22nd Regional Meeting, pages 208–222. Chicago Linguistic Society, 1986. [PS87] C. Pollard and I. Sag. Information based syntax and semantics. In Volume 1: Fundamentals. Stanford, CA: Center for the Study of Language and Information, 1987. [PSvdB94] H. Prust, R. Scha, and M. van den Berg. Discourse grammar and verb phrase anaphora. Linguistics and Philosophy, 17:261–327, 1994. [PT] Penn Treebank. See http://www.ldc.upenn.edu/ldc/online/treebank/ for documentation. [Pul97] S. Pulman. Higher order unification and the interpretation of focus. Linguistics and Philosophy, 20:73–115, 1997. 296
    • [PvdB96] Livia Polanyi and Martin van den Berg. Discourse structure and discourse interpretation. In P. Dekker and M. Stokhof, editors, Proceedings of the Tenth Amsterdam Colloquium. ILLC, Amsterdam, 1996. [PvdB99] Livia Polanyi and Martin van den Berg. Logical structure and discourse anaphora resolution. In D. Cristea, N. Ide, and D. Marcu, editors, Proceedings of the Workshop on the relation of Discourse/Dialogue Structure and Reference. 37th Annual Meeting of the Association of Computational Linguistics, 1999. [QGLS85] R. Quirk, S. Greenbaum, G. Leech, and J. Svartvik. A Comprehensive Grammar of the English Language. Longman, London, 1985. [Qui72] R. Quirk. A Grammar of Contemporary English. Longman, London, 1972. [Red90] Gisela Redeker. Ideational and pragmatic markers of discourse structure. Journal of Pragmatics, 14:367–381, 1990. [Roo85] M. Rooth. Association with Focus. PhD dissertation, University of Massachusetts, Amherst, 1985. [Roo92] M. Rooth. A theory of focus interpretation. Natural Language Semantics, 1:75–116, 1992. [Roo95a] M. Rooth. Focus. In S. Lappin, editor, Handbook of Contemporary Semantic Theory, pages 271–298. Blackwell, London, 1995. [Roo95b] M. Rooth. Indefinites, adverbs of quantification, and focus semantics. In G. Carlson, editor, Generics. 1995. [Sac77] E. Sacerdoti. A Structure for Plans and Behavior. Elsevier, Amsterdam, 1977. [Sae96] K. Saeboe. Anaphoric presuppositions and zero anaphora. Linguistics and Philosophy, 19:187–209, 1996. 297
    • [SBDP00] Matthew Stone, Tonia Bleam, Christine Doran, and Martha Palmer. Lexicalized grammar and the description of motion events. In Workshop TAG 5, Paris, 25-27 May, G 2000. [Sch71] P. Schreiber. Some constraints on the formation of English sentence adverbs. Linguistic Inquiry, 2(1), 1971. [Sch85] R. Schiffman. Discourse Constraints on ‘it’ and ‘that’: A study of language use in career-counseling interviews. PhD dissertation, University of Chicago, 1985. [Sch87] D. Schiffrin. Discourse Markers. Cambridge University Press, Cambridge, MA, 1987. [Sch97] Frank Schilder. Tree discourse grammar, or how to get attached to a discourse. In P. Dekker and M. Stokhof, editors, Proceedings of the Second International Workshop on Computational Semantics. Tilburg, Netherlands, 1997.   [SCT 94] J. Sedivy, G. Carlson, M. Tanenhaus, M. Spivey-Knowlton, and K. Eberhard. The cognitive function of contrast sets in processing focus constructions. In P. Bosch and R. van der Sandt, editors, Focus and Natural Language Processing. IBM Deutschland Informationssysteme GmbH, Institute for Logic and Linguistics, 1994. [SD97] Matthew Stone and Christine Doran. Sentence planning as description using treeadjoining grammar. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL), pages 198–205, 1997. [SdS90] D. R. Scott and C. S. de Souza. Getting the message across in RST-based text generation. In R. Dale, C. Mellish, and M. Zock, editors, Current Research in Natural Language Generation. Academic Press, 1990. [Sea69] J. Searle. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, MA, 1969. [Shi86] Stuart Shieber. An Introduction to Unifcation-Based Approaches to Grammar. CSLI, Stanford, CA, 1986. 298
    • [SHP73] P. Sgall, E. Hajicova, and E. Penesova. Topic, Focus and Generative Semantics. Scriptor, Kronberg and Taunus, 1973. [Sib92] P. Sibun. Generating text without trees. Computational Intelligence, 8(1):102–122, 1992. [SIG02] Proceedings of the Third SIGdial Workshop on Discourse and Dialogue, 11-12 July, Philadelphia, PA, 2002. [Sil76] Michael Silverstein. Shifters, linguistic categories and cultural descriptions. In K. Basso and H. Selby, editors, Meaning in Anthropology, pages 11–55. University of New Mexico Press, 1976. [Sim00] Mandy Simons. Why some presuppositions are conversational implicatures. Handout for University of Pennsylvania Linguistics Speaker Series Talk, November 9, 2000. Dept. of Philosophy, Carnegie Mellon University. [SP88] R. Scha and L. Polanyi. An augmented context free grammar for disocurse. In Proceedings of the 12th International Conference on Computational Linguistics, pages 22–27. COLING, 1988. [SSN93] Ted Sanders, Wilbert Spooren, and Leo Noordman. Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics, 4(2):93–133, 1993. [Sta74] R. Stalnaker. Pragmatic presuppositions. In M. Munitz and P. Unger, editors, Semantics and Philosophy, pages 197–214. New York University Press, 1974. [Ste96] M. Steedman. Surface Structure and Interpretation. The MIT Press, Cambridge, MA, 1996. [Ste00a] M. Steedman. Information structure and the syntax-phonology interface. Linguistic Inquiry, 31(4):649–689, 2000. [Ste00b] M. Steedman. The productions of time: temporality and causality in linguistic semantics. Draft, 2000. 299
    • [Ste00c] M. Steedman. The Syntactic Process. The MIT Press, Cambridge, MA, 2000. [Sto94] M. Stone. Discourse deixis, discourse structure, and the semantics of subordination. Ms., 1994. [Str59] P. Strawson. On referring. Mind, 235:320–344, 1959. [Suz97] H. Suzuki. Notes on the grammar of -ly adverbs in English. Typological Investigation of Languages and Cultures of the East and West, 1997. [SW86] D. Sperber and D. Wilson. Relevance. Harvard University Press, Cambridge, MA, 1986. [Swa88] T. Swan. Sentence Adverbials in English: A Synchronic and Diachronic Investigation. Novus, Tromso-Studier i Sprakvitenskap X, Oslo, 1988. [Umb02] Carla Umbach. Contrast and contrastive topic. In Ivana Kruijff-Korbayova and Mark Steedman, editors, ESSLLI 2001 Workshop on Information Structure, Discourse Structure and Discourse Semantics, 2002. [vD79] T. van Dijk. Pragmatic connectives. Journal of Pragmatics, 3:447–456, 1979. [vdB96] Martin van den Berg. Discourse grammar and dynamic logic. In P. Dekker and M. Stokhof, editors, Proceedings of the Tenth Amsterdam Colloquium. ILLC, Amsterdam, 1996. [vdG69] G. von der Gabelentz. Ideen zu einer vergleichenden Syntax. Wort und Satzstellung, Zeitschrift fur Volkerpsychologie und Sprachwissenschaft 6, 1869. [vdS92] R. van der Sandt. Presupposition projection as anaphora resolution. Journal of Semantics, 9:333–377, 1992. [Ven67] Z. Vendler. Linguistics in Philosophy. Cornell University Press, Cornell, NY, 1967. [Ver97] Cornelia Maria Verspoor. Contextually-Dependent Lexical Semantics. PhD dissertation, University of Edinburgh, 1997. 300
    • [vH00] K. von Heusinger. Accessibility, discourse anaphora, and descriptive content. Proceedings of the Discourse Anaphora and Reference Resolution Conference (DAARC2000), 12, 2000. [vS82] A. von Stechow. Structured propositions. Arbeitspapier 59 des Sonderforschungsbereichs 99. Universitat Konstanz, 1982. [Wal93] Marilyn Walker. Informational Redundancy and Resource Bounds in Dialogue. PhD dissertation, University of Pennsylvania, 1993. [web] http://cctc.commnet.edu/grammar/adverbs.htm, http://www.cis.upenn.edu/ xtag/tech-report/node173.html, F http://faculty.washington.edu/ marynell/grammar/AdverbPl.html. F [Web88] B. Webber. Discourse deixis: Reference to discourse segments. Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, 1988. [Web91] B. Webber. Structure and ostension in the interpretation of discourse deixis. Language and Cognitive Processes, 6(2), 1991. [WJ98] B. Webber and A. Joshi. Anchoring a lexicalized tree-adjoing grammar for discourse. In Proceedings of the Coling/ACL Workshop on Discourse Relations and Discourse Markers, pages 86–92. Montreal, Canada, 1998. [WJP81] M. Walker, A. Joshi, and E. Prince. Centering in naturally occurring discourse: An overview. In M. Walker, A. Joshi, and E. Prince, editors, Centering Theory in Discourse. Clarendon Press, Oxford, 1981. [WJSK99] B. Webber, A. Joshi, M. Stone, and A. Knott. Discourse relations: A structural and presuppositional account using lexicalised TAG. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, pages 41–48. College Park, MD, 1999. 301
    • [WJSK03] Bonnie Webber, Aravind Joshi, Matthew Stone, and Alistair Knott. Anaphora and discourse semantics. To appear in Computational Linguistics, 2003. [WKJ99] B. Webber, A. Knott, and A. Joshi. Multiple discourse connectives in a lexicalised grammar for discourse. In Proceedings of the 3rd International Workshop on Computational Semantics, pages 309–325. Tilburg, The Netherlands, 1999. [WN98] Five Papers on WordNet, 1998. ftp://ftp.cogsci.princeton.edu/pub/wordnet/5papers.ps. 302