Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Machine reasoning

  1. 20/08/2020 1 @FIT, Monash Uni, August 2020 A/Prof Truyen Tran Vuong Le, Thao Le, Hung Le, Dung Nguyen, Tin Pham @truyenoz truyentran.github.io truyen.tran@deakin.edu.au letdataspeak.blogspot.com goo.gl/3jJ1O0 Machine reasoning linkedin.com/in/truyen-tran
  2. 20/08/2020 2 Dual-process theories The problem Neural memories Agenda Visual reasoning
  3. 20/08/2020 3 Among the most challenging scientific questions of our time are the corresponding analytic and synthetic problems: • How does the brain function? • Can we design a machine which will simulate a brain? -- Automata Studies, 1956.
  4. 20/08/2020 4 #REF: A Cognitive Architecture Approach to Interactive Task Learning, John E. Laird, University of Michigan
  5. 1960s-1990s  Hand-crafting rules, domain-specific, logic- based  High in reasoning  Can’t scale.  Fail on unseen cases. 20/08/2020 5 2020s-2030s  Learning + reasoning, general purpose, human-like  Has contextual and common- sense reasoning  Requires less data  Adapt to change  Explainable 1990s-present  Machine learning, general purpose, statistics-based  Low in reasoning  Needs lots of data  Less adaptive  Little explanation Photo credit: DARPA
  6. Learning vs Reasoning Learning is to improve itself by experiencing ~ acquiring knowledge & skills Reasoning is to deduce knowledge from previously acquired knowledge in response to a query (or a cues) Early theories of intelligence (a) focuses solely on reasoning, (b) learning can be added separately and later! (Khardon & Roth, 1997). Learning precedes reasoning or the two interacting? Can reasoning be learnt? Are learning and reasoning indeed different facets of the same mental/computational process? 20/08/2020 6 Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725. (Dan Roth; ACM Fellow; IJCAI John McCarthy Award)
  7. Rich Sutton’s Bitter Lesson 20/08/2020 7 “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. ” “The two methods that seem to scale arbitrarily in this way are search and learning.” http://www.incompleteideas.net/IncIdeas/BitterLesson.html
  8. Bottou’s vision of reasoning Reasoning is not necessarily achieved by making logical inferences There is a continuity between algebraically rich inference and connecting together trainable learning systems Central to reasoning is composition rules to guide the combinations of modules to address new tasks 20/08/2020 8 “When we observe a visual scene, when we hear a complex sentence, we are able to explain in formal terms the relation of the objects in the scene, or the precise meaning of the sentence components. However, there is no evidence that such a formal analysis necessarily takes place: we see a scene, we hear a sentence, and we just know what they mean. This suggests the existence of a middle layer, already a form of reasoning, but not yet formal or logical.” Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133-149.
  9. Learning to reason, formal def 20/08/2020 9 Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725. E.g., given a video f, determines if the person with the hat turns before singing.
  10. 10 What color is the thing with the same size as the blue cylinder? blue • The network guessed the most common color in the image. • Linguistic bias. • Requires multi-step reasoning: find blue cylinder ➔ locate other object of the same size ➔ determine its color (green). A testbed: Visual QA
  11. 11Adapted from (Aditya et al., 2019) Events, Activities Objects, Regions, Trajectories Scene Elements: Volumes, 3D Surfaces Image Elements: Regions, Edges, Textures Knowledge about shapes Qualitative spatial reasoning Relations between objects Commonsense knowledge Relations between objects Why Visual QA?
  12. 20/08/2020 12 Dual-process theories The problem Neural memories Agenda Visual reasoning
  13. Dual-process system with memories 13 System 1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Facts Semantics Events and relational associations Working space – temporal buffer Single
  14. System 2 Holds hypothetical thought Decoupling from representation Working memory size is not essential. Its attentional control is. 20/08/2020 14
  15. Hypotheses Reasoning as just-in-time program synthesis. It employs conditional computation. Reasoning is recursive, e.g., mental travel. 20/08/2020 15
  16. 20/08/2020 16 Dual-process theories The problem Neural memories Agenda Visual reasoning
  17. Visual recognition is a (solvable) special case 17 Image courtesy: https://dcist.com/ What is this? a cat R-CNN, YOLO,… Object recognition Object detection Where are objects, and what are they?
  18. But Image QA is challenging 18 (CLERV, Johnson et al., 2017) (Q) How many objects are either small cylinders or metal things? (Q) Are there an equal number of large things and metal spheres? (GQA, Hudson et al., 2019) (Q) What is the brown animal sitting inside of? (Q) Is there a bag to the right of the green door?
  19. And Video QA is even more challenging 19(Data: TGIF-QA)
  20. 20 Reasoning Computer Vision Natural Language Processing Machine learning Visual Reasoning Visual QA is a great testbed
  21. Reasoning over visual modality • Context:  Videos/Images: compositional data  Hierarchy of visual information: objects, actions, attributes, relations, commonsense knowledge.  Relation of abstract objects: spatial, temporal, semantic aspects. • Research questions:  How to represent object/action relations in images and videos?  How to reason with object/action relations? 21
  22. A simple VQA framework that works surprisingly well 22 Visual representation Question What is the brown animal sitting inside of? Textual representation Fusion Predict answer Answer box CNN Word embedding + LSTM
  23. 20/08/2020 23 A Prelim Dual-System Model Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. "Neural Reasoning, Fast and Slow, for Video Question Answering." IJCNN (2020).
  24. Separate reasoning process from perception 24  Video QA: inherent dynamic nature of visual content over time.  Recent success in visual reasoning with multi-step inference and handling of compositionality. System 1: visual representation System 2: High- level reasoning
  25. 20/08/2020 25 Separate reasoning process from perception (2)
  26. System 1: Clip-based Relation Network 26 For 𝑘𝑘 = 2,3, … , 𝐾𝐾 where ℎ𝜙𝜙 and 𝑔𝑔𝜃𝜃 are linear transformations with parameters 𝜙𝜙 and 𝜃𝜃, respectively, for feature fusion. Why temporal relations? • Situate an event/action in relation to events/actions in the past and formulate hypotheses on future events. • Long-range sequential modeling.
  27. System 2: MAC Net 20/08/2020 27 Hudson, Drew A., and Christopher D. Manning. "Compositional attention networks for machine reasoning." ICLR 2018.
  28. Results on SVQA dataset. 28 Model Exist Count Integer Comparison Attribute Comparison Query All More Equal Less Color Size Type Dir Shape Color Size Type Dir Shape SA(S) 51.7 36.3 72.7 54.8 58.6 52.2 53.6 52.7 53.0 52.3 29.0 54.0 55.7 38.1 46.3 43.1 TA- GRU(T) 54.6 36.6 73.0 57.3 57.7 53.8 53.4 54.8 55.1 52.4 22.0 54.8 55.5 41.7 42.9 44.2 SA+TA 52.0 38.2 74.3 57.7 61.6 56.0 55.9 53.4 57.5 53.0 23.4 63.3 62.9 43.2 41.7 44.9 CRN+M AC 72.8 56.7 84.5 71.7 75.9 70.5 76.2 90.7 75.9 57.2 76.1 92.8 91.0 87.4 85.4 75.8
  29. Results on TGIF-QA dataset. 29 Model Action Trans. FrameQA Count ST-TP 62.9 69.4 49.5 4.32 Co-Mem 68.2 74.3 51.5 4.10 PSAC 70.4 76.9 55.7 4.27 CRN+MAC 71.3 78.7 59.2 4.23
  30. 20/08/2020 30 A System 1 for Video Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Hierarchical conditional relation networks for video question answering”, CVPR'20. System 1: visual representation System 2: High-level reasoning
  31. Conditional Relation Network Unit 31 Problems: • Lack of a generic mechanism for modeling the interaction of multimodal inputs. • Flaws in temporal relations of breaking local motion in videos. Inputs: • An array of 𝑛𝑛 objects • Conditioning feature Output: • An array of 𝑚𝑚 objects
  32. Hierarchical Conditional Relation Networks 32
  33. Results 33 Model Action Trans. FrameQA Count ST-TP 62.9 69.4 49.5 4.32 Co-Mem 68.2 74.3 51.5 4.10 PSAC 70.4 76.9 55.7 4.27 HME 73.9 77.8 53.8 4.02 HCRN 75.0 81.4 55.9 3.82 TGIF-QA dataset MSVD-QA and MSRVTT-QA datasets.
  34. 20/08/2020 34 A System 2 for Relational Reasoning Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI’20. System 1: visual representation System 2: High-level reasoning
  35. Reasoning with structured representation of spatial relation 35 • Key insight: Reasoning is chaining of relational predicates to arrive at a final conclusion → Needs to uncover spatial relations, conditioned on query → Chaining is query-driven → Objects/language needs binding → Object semantics is query-dependent → Very thing is end-to-end differentiable
  36. 36 Language-binding Object Graph Network for VQA
  37. 37 Language-binding Object Graph Unit (LOG)
  38. 20/08/2020 38
  39. Results in CLEVR dataset 39
  40. 40 Comparison with the on CLEVR dataset of different data fractions. Performance comparison on VQA v2 subset of long questions. Model Action XNM(Objects) 43.4 MAC(ResNet) 40.7 MAC(Objects) 45.5 LOGNet(Objects) 46.8
  41. 20/08/2020 41
  42. 20/08/2020 42 Dual-process theories The problem Neural memoriesVisual reasoning Agenda
  43. “[…] one cannot hope to understand reasoning without understanding the memory processes […]” (Thompson and Feeney, 2014) 20/08/2020 43
  44. Dual-process system with memories 44 System 1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Facts Semantics Events and relational associations Working space – temporal buffer Single
  45. From psychology & cognitive science Reasoning and memory are studied separately, but …  Analytic reasoning requires working memory  Spatial reasoning requires place memory  Inferential reasoning requires episodic memory Dual-processes in memory:  Fast retrieval/recognition (using similarity as criteria)  Slow item recollection Dual-processes in reasoning  Induction as fast (using similarity as criteria)  Deduction as slow
  46. Reasoning as memory Expert reasoning was enabled by a large long-term memory, acquired through experience Working memory for analytic reasoning  WM capacity predicts IQ  Higher order cognition = creating & manipulating relations  representation of premises, temporarily stored in WM  WM is a system to support information binding to a coordinate system  Reasoning as deliberative hypothesis testing  memory-retrieval based hypothesis generation Knowledge structures support reasoning Fuzzy-trace theory: gist ( intuition) and verbatim memory ( analytic processing) Memory is critical for episodic future thinking (mental simulation) 20/08/2020 46
  47. 20/08/2020 47 Learning a Turing machine  Can we learn a (neural) program that learns to program from data? Visual reasoning is a specific program of two inputs (visual, linguistic)
  48. Neural Turing machine (NTM) A memory-augmented neural network (MANN) A controller that takes input/output and talks to an external memory module. Memory has read/write operations. The main issue is where to write, and how to update the memory state. All operations are differentiable. Source: rylanschaeffer.github.io
  49. MANN for reasoning Three steps:  Store data into memory  Read query, process sequentially, consult memory  Output answer Behind the scene:  Memory contains data & results of intermediate steps  LOGNet does the same, memory consists of object representations Drawbacks of current MANNs:  No memory of controllers  Less modularity and compositionality when query is complex  No memory of relations  Much harder to chain predicates. 20/08/2020 49 Source: rylanschaeffer.github.io
  50. 20/08/2020 50 Universal Neural Turing Machines Hung Le, Truyen Tran, Svetha Venkatesh, “Neural stored-program memory”, ICLR'20. Hung Le, Truyen Tran, Svetha Venkatesh, “Neurocoder: Learning General- Purpose Computation Using Stored Neural Programs”, Work in progress.
  51. Computing devices vs neural counterparts FSM (1943) ↔ RNNs (1982) PDA (1954) ↔ Stack RNN (1993) TM (1936) ↔ NTM (2014) The missing piece: A memory to store programs Neural stored-program memory UTM/VNA (1936/1945) ↔ NUTM--ours (2019)
  52. NUTM = NTM + NSM
  53. Question answering (bAbI dataset) Credit: hexahedria
  54. Neurocoder Flexible programs are essential for intelligence Cognitive science: Meta-level process controls object-level process Instance-specific program vs Task-specific program In QA, program is query-specific  On-demand program synthesis Idea: A program to generate programs Previous works: Fast weights, HyperNetworks, Neural Module Networks, Adaptive Mixture of Experts Current work: Programs are generated by combining basis sub- programs  The hope: Basis programs span the space of all possible programs 20/08/2020 54
  55. Neurocoder (cont.) 20/08/2020 55
  56. 20/08/2020 56
  57. 20/08/2020 57 Neural relational memory Hung Le, Truyen Tran, Svetha Venkatesh, “Self-attentive associative memory”, ICML'20.
  58. Failures of current MANNs for reasoning Relational representationis NOT stored  Can’t reuse later in the chain A single memory of items and relations  Can’t understand how relational reasoning occurs The memory-memory relationship is coarse since it is represented as either dot product, or weighted sum. 20/08/2020 58
  59. Self-attentive associative memories (SAM) Learning relations automatically over time 20/08/2020 59
  60. Multi-step reasoning over graphs 20/08/2020 60
  61. Reasoning on text 20/08/2020 61
  62. 20/08/2020 62 Thank you Truyen Tran @truyenoz truyentran.github.io truyen.tran@deakin.edu.au letdataspeak.blogspot.com goo.gl/3jJ1O0 linkedin.com/in/truyen-tran
Advertisement