Biemann ibm cog_comp_jan2015_noanim

594 views

Published on

Cognitive Systems Institute Group Speaker Series Jan 15, 2015 call with presenter Chris Biemann on Adaptive Natural Language Processing

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
594
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Biemann ibm cog_comp_jan2015_noanim

  1. 1. Cognitive Systems Institute External Speaker Series January 15, 2015 Chris Biemann biem@cs.tu-darmstadt.de Adaptive Natural Language Processing
  2. 2. 2 Natural Language Understanding – the key to intelligent behavior § Most information and knowledge is encoded in unstructured form in natural language § When humans learn about a new topic, they read about it – machines should do the same § Natural language content on the internet is growing constantly § Natural language is evolving, and natural language processing should account for that Cognitive computing Cognitive computing systems learn and interact naturally with people to extend what either humans or machine could do on their own. They help human experts make better decisions by penetrating the complexity of Big Data. http://www.research.ibm.com/cognitive-computing
  3. 3. 3 Why Language is difficult .. He sat on the river bank and counted his dough. She went to the bank and took out some money.
  4. 4. 4 Why Language is difficult .. He sat on the river bank and counted his dough. She went to the bank and took out some money. Lexical Layer Concept Layer
  5. 5. 5 Why Language is difficult .. He sat on the river bank and counted his dough. She went to the bank and took out some money. Lexical Layer Concept Layer polysemous
  6. 6. 6 Why Language is difficult .. He sat on the river bank and counted his dough. She went to the bank and took out some money. Lexical Layer Concept Layer synonymouspolysemous
  7. 7. 7 Why Not To Use Dictionaries or Ontologies Advantages: § Sense inventory given § Linking to concepts § Full control Photo by zeh fernando under Creative Commons licence http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
  8. 8. 8 Why Not To Use Dictionaries or Ontologies Advantages: § Sense inventory given § Linking to concepts § Full control Photo by zeh fernando under Creative Commons licence Disadvantages: •  Dictionaries have to be created •  Dictionaries are incomplete •  Language changes constantly: new words, new meanings … http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
  9. 9. 9 Why Not To Use Dictionaries or Ontologies Advantages: § Sense inventory given § Linking to concepts § Full control Photo by zeh fernando under Creative Commons licence “give a man a fish and you feed him for a day… Disadvantages: •  Dictionaries have to be created •  Dictionaries are incomplete •  Language changes constantly: new words, new meanings … http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
  10. 10. 10 Structure Discovery Paradigm … teach a man to fish and you feed him for a lifetime” Consequences: § Only raw text input required § No fine-grained control on categories § Cognitive system: learns from and adopts to data Task Use annotations as features Text Data SD algorithm Find regularities by analysis Annotate data with regularities SD algorithm SD algorithm SD algorithms
  11. 11. 11 The JoBimText project – www.jobimtext.org Partners: §  Lead at IBM: Alfio Gliozzo IBM Watson DeepQA, Yorktown, NY, USA §  Lead at TU DA: Chris Biemann Language Technology, TU Darmstadt, Germany Software Capabilities: §  Compute a Distributional Thesaurus §  Compute Sense Representations §  2-Dimensional Text: Contextualized Expansion §  RESTful API and Web Demo Features: §  Scalable architecture §  Open Source, ASL 2.0
  12. 12. 12 2D Text: Matching Meaning beyond Keywords almost no word overlap Where was the first professor for electric science established? In 1883 the first faculty for electrical engineering was founded there.
  13. 13. 13 2D Text: Matching Meaning beyond Keywords Where was the first professor for electric science established? In 1883 the first faculty for electrical engineering was founded there. teacher professor student graduate alumnus staff campus electric mechanical thermal electronic industrial optical automotive science sciences biology physics economics mathematics psychology co-found form establish own join rename bear director emeritus dean lecturer president psychologist historian electrical heavy-duty antique battery-powered electronic stainless diesel biology economics sciences mathematics physics math psychology create form set maintain found abolish strengthen
  14. 14. 14 2D Text: Matching Meaning beyond Keywords Where was the first professor for electric science established? In 1883 the first faculty for electrical engineering was founded there. teacher professor student graduate alumnus staff campus electric mechanical thermal electronic industrial optical automotive science sciences biology physics economics mathematics psychology co-found form establish own join rename bear director emeritus dean lecturer president psychologist historian electrical heavy-duty antique battery-powered electronic stainless diesel biology economics sciences mathematics physics math psychology create form set maintain found abolish strengthen
  15. 15. 15 Sipping cappuccino .. § s
  16. 16. 16 .. in Milan. § s
  17. 17. 17 .. in Milan. § s
  18. 18. 18 Clustering of DT entries: Sense Induction bright#JJ paper#NN C. Biemann (2006): Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, New York, USA.
  19. 19. 19 Features for Disambiguation paper 0 (newspaper) read#VB#-dobj 45 reading#VBG#-dobj 45 write#VB#-dobj 38 read#VBD#-dobj 37 writing#VBG#-dobj 36 wrote#VBD#-dobj 34 original#JJ#amod 27 wrote#VBD#-prep_in 26 recent#JJ#amod 26 published#VBN#partmod 25 written#VBN#-dobj 23 published#VBN#-nsubjpass 20 published#VBD#-dobj 19 copy#NN#-prep_of 18 said#VBD#-prep_in 18 author#NN#-prep_of 17 pages#NNS#-prep_of 16 told#VBD#-dobj 15 buy#VB#-dobj 14 published#VBN#-prep_in 14 page#NN#-prep_of 14 paper 1 (material) piece#NN#-prep_of 21 pieces#NNS#-prep_of 17 made#VBN#-prep_from 13 bags#NNS#-nn 11 white#JJ#amod 9 paper#NN#-conj_and 9 glass#NN#-conj_and 9 products#NNS#-nn 9 industry#NN#-nn 8 plastic#NN#conj_and 8 plastic#NN#-conj_and 8 bits#NNS#-prep_of 8 bag#NN#-nn 8 plastic#NN#conj_or 8 sheet#NN#-prep_of 7 recycled#JJ#amod 7 tons#NNS#-prep_of 7 glass#NN#conj_and 7 buy#VB#-dobj 6 plates#NNS#-nn 6 pile#NN#-prep_of 6 These are shared by paper and the cluster members. Disambiguation: find features in context. I am reading an original paper on the paper .
  20. 20. 20 § d Paraphrasing with JoBimText
  21. 21. 21 § d Paraphrasing with JoBimText
  22. 22. 22 JoBimText Model example “beetle” S. Mitra, R. Mitra, M. Riedl, C. Biemann, A. Mukherjee, P. Goyal (2014): That’s sick dude!: Automatic identification of word sense change across different timescales. Proceedings of ACL-2014, Baltimore, MD, USA http://www.thezooom.com/2013/01/10749/
  23. 23. 23 JoBimText Model example “beetle” S. Mitra, R. Mitra, M. Riedl, C. Biemann, A. Mukherjee, P. Goyal (2014): That’s sick dude!: Automatic identification of word sense change across different timescales. Proceedings of ACL-2014, Baltimore, MD, USA http://www.thezooom.com/2013/01/10749/
  24. 24. 24 Outlook: From Similarities and Relations… Cathy liked the blue dress very much. She bought it for 15 Euros from the shop. gown skirt blouse Pat Brian Kevin red purple green currency greenback yen store restaurant boutique COLOR CLOTHINGFIRSTNAME MONEY SALESPOINT HAS-PROPERTY 1: ENTITIES 2. RELATIONS
  25. 25. 25 Sneak Preview: Induction of Relations § JoBimText model on pairs and paths between pairs
  26. 26. 26 … to Frames and Causality She bought it for 15 Euros from the shop. MONEY SALESPOINT FIRSTNAME adored CLOTHING FIRSTNAME found CLOTHING great POSITIVE-OPINION-ABOUT subj=FIRSTNAME obj=CLOTHING VERKAUFSVORGANG subj=AGENT obj=THING für=MONEY loc=SALESPOINT FIRSTNAME CLOTHING Cathy dress Cathy dress 3: FRAMES 4: CAUSALITY Cathy liked the blue dress very much. COLOR CLOTHINGFIRSTNAME HAS-PROPERTY
  27. 27. 27 Sneak Preview: Frame Induction § s
  28. 28. 28 § JoBimText informs relation extraction significant improvements in EMRA application, e.g. for finding drug prescriptions for diseases § JoBimText sense clusters are being used to inform term matching e.g. when finding justifications for answers § JoBimText is one of the solutions for knowledge induction from text in new domains Applications of JoBimText in IBM Watson
  29. 29. 29 Conclusion § The role of Natural Language Processing in Cognitive Computing is two-fold: § the technology for natural interaction with the system § a technology subject to be framed in the cognitive paradigm
  30. 30. 30 Conclusion § The role of Natural Language Processing in Cognitive Computing is two-fold: § the technology for natural interaction with the system § a technology subject to be framed in the cognitive paradigm § Adaptive Natural Language Processing § makes use of static AND dynamically generated resources § is driven by (text) data that defines its application domain § accounts for language evolution and new meanings by adaptation to the data § beyond NLP pipelines
  31. 31. 31 Thanks.. .. and now some (deep) QA! www.jobimtext.org Special Track: Semantic and Cognitive Computing
  32. 32. 32
  33. 33. 33 The @-ing (‘holing’) operation: producing pairs of Jos and Bims SENTENCE: I suffered from a cold and took aspirin. STANFORD COLLAPSED DEPENDENCIES: nsubj(suffered, I); nsubj(took, I); root(ROOT, suffered); det(cold, a); prep_from(suffered, cold); conj_and(suffered, took); dobj(took, aspirin) WORD-CONTEXT PAIRS: suffered nsubj(@@, I) 1 took nsubj(@@, I) 1 cold det(@@, a) 1 suffered prep_from(@@, cold) 1 suffered conj_and(@@, took) 1 took dobj(@@, aspirin) 1 I nsubj(suffered, @@) 1 I nsubj(took, @@) 1 a det(cold, @@) 1 cold prep_from(suffered, @@) 1 took conj_and(suffered, @@) 1 aspirin dobj(took, @@) 1 http://nlp.stanford.edu:8080/parser/ Jo Bim
  34. 34. 34 Distributional Thesaurus (DT) § Computed from distributional similarity statistics § Entry for a target word consists of a ranked list of neighbors meeting meeting 288 meetings 102 hearing 89 session 68 conference 62 summit 51 forum 46 workshop 46 hearings 46 ceremony 45 sessions 41 briefing 40 event 40 convention 38 gathering 36 ... articulate articulate 89 explain 19 understand 17 communicate 17 defend 16 establish 15 deliver 14 evaluate 14 adjust 14 manage 13 speak 13 change 13 answer 13 maintain 13 ... immaculate amod(condition,@@) perfect amod(timing,@@) nsubj(@@,hair) cop(@@,remains) First order immaculate perfect Second order 3 amod(Church,@@)
  35. 35. 35 Scaling Computation with MapReduce Roomano is a hard Gouda-like cheese from Friesland in the northern part of The Netherlands. It pairs well with aged sherries ... FreqSig t: min freq s: min sign Holing using gramm. relations word feature t hard#a cheese#ADJ_MODn 17 cheese#n Gouda-like#ADJ_MODa 5 cheese#n hard#ADJ_MODa 17 pair#v well#ADV_MODa 3 ... .... ... word feature s hard#a cheese#ADJ_MODn 15.8 cheese#n Gouda-like#ADJ_MODa 7.6 cheese#n hard#ADJ_MODa 0.4 ... .... ... AggrPerFtfeature words cheese#ADJ_MODn hard#a, yellow#a, French#a hard#ADJ_MODa cheese#n, stone#n ... .... ... SimCounts w: weighting for # words/ feature word word w.sum hard#a yellow#a 0.234 yellow#a hard#a 0.234 cheese#n stone#n 3.14 ... .... ... PruneGraph p: max number of features per word ; s (like data below) Convert sum threshold ibm i.b.m. 164 intel 154 hewlett-packard 151 dell 141 cisco 134 microsoft 125 hp 124 green: Steps blue: Parameters

×