Karlgren

1,893 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,893
On SlideShare
0
From Embeds
0
Number of Embeds
1,491
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Karlgren

  1. 1. constructional syntactic analysis for information access tasks jussi karlgren yandex, february 16, 2011
  2. 2. jussi karlgrenph d in (computational) linguistics from stockholmsenior researcher in information access at sics, stockholmdocent in language technology at univ of helsinkifounding partner, gavagai ab, stockholm
  3. 3. • independent non-profit research institute• about 100-200 researchers• ... networks, distributed systems, programming tools, collaborative environments, information access, design, digital art...
  4. 4. • recent startup company• about 7-8 employees• extracts actionable intelligence from very large text streams
  5. 5. why use computational methods and machinery for informationaccess?two reasons: 1 amount of data is overwhelming → reduce data complexity let’s call these “simple” tasks 2 signal is weak and complex → peer closer into data let’s call these “difficult” tasks
  6. 6. why use computational methods and machinery for informationaccess?two reasons: 1 amount of data is overwhelming → reduce data complexity let’s call these “simple” tasks 2 signal is weak and complex → peer closer into data let’s call these “difficult” tasks
  7. 7. why use computational methods and machinery for informationaccess?two reasons: 1 amount of data is overwhelming → reduce data complexity let’s call these “simple” tasks 2 signal is weak and complex → peer closer into data let’s call these “difficult” tasks
  8. 8. why use computational methods and machinery for informationaccess?two reasons: 1 amount of data is overwhelming → reduce data complexity let’s call these “simple” tasks 2 signal is weak and complex → peer closer into data let’s call these “difficult” tasks
  9. 9. for the simple tasks the sensible thing to do is topound the text into small bits and count the various type of bit.
  10. 10. for the simple tasks the sensible thing to do is topound the text into small bits and count the various type of bit.
  11. 11. this works well up to a point.for search engines.but not for e.g. authorship attribution.
  12. 12. this works well up to a point.for search engines.but not for e.g. authorship attribution.
  13. 13. this works well up to a point.for search engines.but not for e.g. authorship attribution.
  14. 14. what is the next sensible thing to do?try to organise the bits into piles first, generalising them,ortry to see if the bits have relations to each other, building morecomplex structures.which involves non-trivial complex decisions and results in a brittle, error-prone procedure.
  15. 15. what is the next sensible thing to do?try to organise the bits into piles first, generalising them,ortry to see if the bits have relations to each other, building morecomplex structures.which involves non-trivial complex decisions and results in a brittle, error-prone procedure.
  16. 16. what is the next sensible thing to do?try to organise the bits into piles first, generalising them,ortry to see if the bits have relations to each other, building morecomplex structures.which involves non-trivial complex decisions and results in a brittle, error-prone procedure.
  17. 17. why is parsing impractical? 1 new text 2 categories unfounded in data 3 dependencies not based on necessity or efficiency
  18. 18. why is parsing impractical? 1 new text 2 categories unfounded in data 3 dependencies not based on necessity or efficiency
  19. 19. why is parsing impractical? 1 new text 2 categories unfounded in data 3 dependencies not based on necessity or efficiency
  20. 20. why is parsing impractical? 1 new text 2 categories unfounded in data 3 dependencies not based on necessity or efficiency
  21. 21. what is in the signal?“It is this, I think, that commentators mean when they say gliblythat the ‘world changed’ after Sept 11.”words? or something more?
  22. 22. what is in the signal?“It is this, I think, that commentators mean when they say gliblythat the ‘world changed’ after Sept 11.”words? or something more?
  23. 23. linguistics has an answer.but that answer doesn’t help much in practical applications.
  24. 24. linguistics has an answer.but that answer doesn’t help much in practical applications.
  25. 25. what is in the signal to begin with?just words?and a pattern?
  26. 26. what is in the signal to begin with?just words?and a pattern?
  27. 27. what is in the signal to begin with?just words?and a pattern?
  28. 28. „ah, but the words are in the clause, the pattern is only expressedby the words that participate in it.”
  29. 29. so patterns do not exist when not in use?do words exist outside their usage?„To ask where a verbal operant is when a response is not in the course ofbeing emitted is like asking where one’s knee-jerk is when the physician isnot tapping the patellar tendon.”B.F. Skinner, Verbal Behavior
  30. 30. so patterns do not exist when not in use?do words exist outside their usage?„To ask where a verbal operant is when a response is not in the course ofbeing emitted is like asking where one’s knee-jerk is when the physician isnot tapping the patellar tendon.”B.F. Skinner, Verbal Behavior
  31. 31. so patterns do not exist when not in use?do words exist outside their usage?„To ask where a verbal operant is when a response is not in the course ofbeing emitted is like asking where one’s knee-jerk is when the physician isnot tapping the patellar tendon.”B.F. Skinner, Verbal Behavior
  32. 32. we claim that patterns are part of the signal,- not incidental to it,- nor secondary to the terms in it.this appears to be a contentious statement.
  33. 33. we claim that patterns are part of the signal,- not incidental to it,- nor secondary to the terms in it.this appears to be a contentious statement.
  34. 34. radical construction grammar (cxg) 1 syntax-lexicon continuum 2 form and function specified in unified model 3 structurally cohesive(william croft, 2005)
  35. 35. 1. syntax-lexicon continuum Construction type Examples Complex and abstract syntax sbj be-tense verb-en by agent Complex and concrete idiom up-tense the ante Complex and bound morphology noun-s Atomic and abstract category adj, clause Atomic and concrete lexicon this, greenare all equal: constructions are the primitive elements→ no parts of speech, no syntactic categories necessary
  36. 36. 2. form and function specified in unified model→ no separate syntactic (or semantic) component necessary
  37. 37. 3. structurally cohesive→ everything is constructions and nothing else; everything isspecific and nothing is universal
  38. 38. practically:the pattern of an utterance is a feature with the same ontologicalstatus as the terms that occur in the utterance.constructions and lexemes both have conceptual meaning.constructions or patterns are present even without recourse to thewords in it.
  39. 39. practically:the pattern of an utterance is a feature with the same ontologicalstatus as the terms that occur in the utterance.constructions and lexemes both have conceptual meaning.constructions or patterns are present even without recourse to thewords in it.
  40. 40. practically:the pattern of an utterance is a feature with the same ontologicalstatus as the terms that occur in the utterance.constructions and lexemes both have conceptual meaning.constructions or patterns are present even without recourse to thewords in it.
  41. 41. our claim:to study pattern occurrences, no coupling between the features andthe words carrying them needs to be done.this is quite convenient.which is good.
  42. 42. our claim:to study pattern occurrences, no coupling between the features andthe words carrying them needs to be done.this is quite convenient.which is good.
  43. 43. patterns, in various forms have been used in language technologyfor some time:linguistic string project. (1965-1998)Naomi Sager et alleading toinformation extraction.large number of adhoc pattern descriptions, closely based on data as observed in use
  44. 44. patterns, in various forms have been used in language technologyfor some time:linguistic string project. (1965-1998)Naomi Sager et alleading toinformation extraction.large number of adhoc pattern descriptions, closely based on data as observed in use
  45. 45. now turn to one example task: identification and analysis ofattitude
  46. 46. attitude analysis can be done on any text sourceblogs: unfettered discourse, wom, low publication threshold, noeditorial controlbut it’s new text — new processing practice necessary
  47. 47. attitude analysis can be done on any text sourceblogs: unfettered discourse, wom, low publication threshold, noeditorial controlbut it’s new text — new processing practice necessary
  48. 48. a prototypical attitudinal expression Expression WHO FEELS WHAT ABOUT WHAT I like sauerkraut I like sauerkraut Kissing is nice ? nice kiss someone sentiment term topicis this picture true?
  49. 49. a prototypical attitudinal expression Expression WHO FEELS WHAT ABOUT WHAT I like sauerkraut I like sauerkraut Kissing is nice ? nice kiss someone sentiment term topicis this picture true?
  50. 50. it is this, i think, that commentators mean when they say gliblythat the ‘world changed’ after sept 11.president hafez al-assad has said that peace was a pressing need forthe region and the world at large and syria, considering peace astrategic option would take steps towards peace.mr cohen, beginning an eight-day european tour including a natodefence ministers’ meeting in brussels today and tomorrow, said heexpected further international action soon, though not necessarilymilitary intervention.the designers from house on fire do not like random play.sauerkraut is damn good but kimchi is even better.bertram powerboats have a deep v hull and handle well in choppysea.m.a.k. halliday thought it natural to view syntax from a functionalperspective.
  51. 51. our claim:attitude is not only lexical or lexicon is not only words & terms “He blew me off” vs “He blew off” “He has the best result, we cannot fail him” vs “This is the best coffee, we cannot fail with it” “Fifth Avenue”, “9/11”
  52. 52. an experiment
  53. 53. we’ll hand code a number of sample constructions to test our claimthat they might be useful to identify attitudinal expressions.remember: to study patterns — we do not need to encode explictlinkage to words!
  54. 54. we’ll hand code a number of sample constructions to test our claimthat they might be useful to identify attitudinal expressions.remember: to study patterns — we do not need to encode explictlinkage to words!
  55. 55. we represent each sentence using three separate sets of features: I content words F form words K constructions
  56. 56. I featurescontent words – nouns, adjectives, verbs (including verbal uses ofparticiples), adverbs, abbreviations, numerals, interjections, andnegation
  57. 57. F featuresfunction words – prepositions, determiners, conjunctions, pronouns
  58. 58. K : sentence structuretransitivity, predicate, relative, and object clauses, tense shift withinsentence
  59. 59. K : various adverbialsadverbials of location, time, manner, condition, quantity, clauseadverbial, clause initial adverbials
  60. 60. K : morphology of sentence constituentspresent or past tense, adjectives in base, comparative, or superlativeform
  61. 61. K : word dependencies and categoriessubordinate conjunctions, negations, prepositional post modifiers,verb chains, quantifiers, particle verbs, prepositional phrases,adjective modifiers
  62. 62. “It is this, I think, that commentators mean when they say glibly that the ‘world changed’ after Sept 11.”I be think commentator mean when say glibly world change sept 11F it this i that they that the afterK AdvlTim, AdvlMan, ObjCls, PredCls, TRIn, TRtr, TRmix, TnsPres, TnsPast, TnsShift
  63. 63. in preliminary experiments with SVM feature selection we find thatseveral of the K features have high rank for categorisation, notablyTnsShift, TnsPast, TRmix, PredCls.
  64. 64. TnsShift“Noam Chomsky saidpast that what makes human language uniqueispresent recursive centre embedding”“M.A.K. Halliday believedpast that grammar, viewed functionally,ispresent natural”→ saves us from acquiring and maintaining lists of verbs ofutterance, pronuncement, and cognition.
  65. 65. TnsShift“Noam Chomsky saidpast that what makes human language uniqueispresent recursive centre embedding”“M.A.K. Halliday believedpast that grammar, viewed functionally,ispresent natural”→ saves us from acquiring and maintaining lists of verbs ofutterance, pronuncement, and cognition.
  66. 66. 0.15 KmpAdj 0.10 AdjMod TnsShift Neg RelCls 0.05 TnsPastFactor 2 (13.8 %) VChain k.1 0.00 ObjCls k.2 Pos TRmix NoAtt PredCls -0.05 SupAdj Neu -0.10 AdvlMan k -0.15 Neg
  67. 67. our experiment: 1 represent sentences using three sets of features: I , F , K . 2 build a language representation using one year of newsprint: test for differences between KT, MD, GH. 3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7, MPQA. 4 put test sentences in word space (random indexing, 2000 dims) with added feature indicating attitude. 5 extract feature vector from word space and run through SVM. 6 test with five-fold crossvalidation.
  68. 68. our experiment: 1 represent sentences using three sets of features: I , F , K . 2 build a language representation using one year of newsprint: test for differences between KT, MD, GH. 3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7, MPQA. 4 put test sentences in word space (random indexing, 2000 dims) with added feature indicating attitude. 5 extract feature vector from word space and run through SVM. 6 test with five-fold crossvalidation.
  69. 69. our experiment: 1 represent sentences using three sets of features: I , F , K . 2 build a language representation using one year of newsprint: test for differences between KT, MD, GH. 3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7, MPQA. 4 put test sentences in word space (random indexing, 2000 dims) with added feature indicating attitude. 5 extract feature vector from word space and run through SVM. 6 test with five-fold crossvalidation.
  70. 70. our experiment: 1 represent sentences using three sets of features: I , F , K . 2 build a language representation using one year of newsprint: test for differences between KT, MD, GH. 3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7, MPQA. 4 put test sentences in word space (random indexing, 2000 dims) with added feature indicating attitude. 5 extract feature vector from word space and run through SVM. 6 test with five-fold crossvalidation.
  71. 71. our experiment: 1 represent sentences using three sets of features: I , F , K . 2 build a language representation using one year of newsprint: test for differences between KT, MD, GH. 3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7, MPQA. 4 put test sentences in word space (random indexing, 2000 dims) with added feature indicating attitude. 5 extract feature vector from word space and run through SVM. 6 test with five-fold crossvalidation.
  72. 72. our experiment: 1 represent sentences using three sets of features: I , F , K . 2 build a language representation using one year of newsprint: test for differences between KT, MD, GH. 3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7, MPQA. 4 put test sentences in word space (random indexing, 2000 dims) with added feature indicating attitude. 5 extract feature vector from word space and run through SVM. 6 test with five-fold crossvalidation.
  73. 73. experimental data: NTCIR 6 NTCIR 7 SEMEVAL MPQA Attitudinal 1 392 1 075 76 6 021 Non-attitudinal 4 416 3 201 174 4 982 Total 5 808 4 276 250 11 003
  74. 74. F1 NTCIR 6 NTCIR 7 MPQA SEMEVAL I 46.1 45.2 63.4 42.4 F 44.9 47.5 65.4 40.4 K 42.3 43.6 63.7 33.8 IF 45.9 47.4 67.3 41.4 IK 45.9 48.6 67.0 38.6 FK 46.1 47.9 68.0 37.5 IFK 47.5 48.6 69.2 41.8 Precision range approx 40 approx 70 approx 30 Recall range approx 55-65K features often help and never really hurt.(karlgren et al, ECIR 2010)
  75. 75. SEMEVAL is different: Discovered Boys Bring Shock, Joy (+45) Iraq Car Bombings Kill 22 People, Wound more than 60 (−98)
  76. 76. 1 results tie with reported NTCIR and SEMEVAL results, not far from best MPQA results.2 combinations with K generally better than those without.3 SEMEVAL data: much lower results, no surprise given terseness.4 background language model has some effect: Glasgow Herald better precision; Korea Times better for recall for NTCIR data.
  77. 77. 1 results tie with reported NTCIR and SEMEVAL results, not far from best MPQA results.2 combinations with K generally better than those without.3 SEMEVAL data: much lower results, no surprise given terseness.4 background language model has some effect: Glasgow Herald better precision; Korea Times better for recall for NTCIR data.
  78. 78. 1 results tie with reported NTCIR and SEMEVAL results, not far from best MPQA results.2 combinations with K generally better than those without.3 SEMEVAL data: much lower results, no surprise given terseness.4 background language model has some effect: Glasgow Herald better precision; Korea Times better for recall for NTCIR data.
  79. 79. 1 results tie with reported NTCIR and SEMEVAL results, not far from best MPQA results.2 combinations with K generally better than those without.3 SEMEVAL data: much lower results, no surprise given terseness.4 background language model has some effect: Glasgow Herald better precision; Korea Times better for recall for NTCIR data.
  80. 80. but this was all hand coded.
  81. 81. next (preliminary) experiment: skeletons.
  82. 82. put sentences in word space (random indexing, 2000 dims) withadded feature for each trigram of structural terms
  83. 83. prove utility by better choice of task? • sentiment and opinion identification • quote identification • novelty detection • authorship attribution • summarisation • terminology miningsuggestions?
  84. 84. prove utility by better choice of task? • sentiment and opinion identification • quote identification • novelty detection • authorship attribution • summarisation • terminology miningsuggestions?
  85. 85. prove utility by better choice of task? • sentiment and opinion identification • quote identification • novelty detection • authorship attribution • summarisation • terminology miningsuggestions?
  86. 86. take home1 constructional models have suitable granularity for simple difficult tasks2 constructional models provide simple methodology to test the effect of language structure3 constructional features are not subsidiary to word occurrence features4 constructional analysis has a long history in language technology5 constructional analysis has an opportunity to influence linguistics
  87. 87. take home1 constructional models have suitable granularity for simple difficult tasks2 constructional models provide simple methodology to test the effect of language structure3 constructional features are not subsidiary to word occurrence features4 constructional analysis has a long history in language technology5 constructional analysis has an opportunity to influence linguistics
  88. 88. take home1 constructional models have suitable granularity for simple difficult tasks2 constructional models provide simple methodology to test the effect of language structure3 constructional features are not subsidiary to word occurrence features4 constructional analysis has a long history in language technology5 constructional analysis has an opportunity to influence linguistics
  89. 89. take home1 constructional models have suitable granularity for simple difficult tasks2 constructional models provide simple methodology to test the effect of language structure3 constructional features are not subsidiary to word occurrence features4 constructional analysis has a long history in language technology5 constructional analysis has an opportunity to influence linguistics
  90. 90. take home1 constructional models have suitable granularity for simple difficult tasks2 constructional models provide simple methodology to test the effect of language structure3 constructional features are not subsidiary to word occurrence features4 constructional analysis has a long history in language technology5 constructional analysis has an opportunity to influence linguistics
  91. 91. to discuss1 what constitutes a construction?2 what is not a construction?3 what sort of tasks can use constructions profitably?4 what sort of abstractions do we want to use for describing constructions productively?5 how can we learn constructions automatically?

×