A Hypermap Model for  Multiple Sequence Processing  Abel Nyamapfene 30 April 2007
Research Motivation I am investigating  complex sequence processing   and  Multiple Sequence Processing  using an  Unsupervised   Neural Network  processing paradigm based on the Hypermap Model by Kohonen
What is A Sequence? A sequence is defined as a finite set of pattern items: S: s 1  – s 2  - … - s n where s j  :  j =  1 , …, n  is a component of the sequence  and the length of the sequence is  n .  Examples:  In Language Processing   Speech utterances Action sequences Gestural sequences In Multimedia Processing   Video sequences  speech
Why Unsupervised Processing?  Distributed sequence processing prone to catastrophic interference Requirement of teaching signal inappropriate for unsupervised processing applications
Issues in Complex Sequence Processing Definition: A sequence is  complex  if it contains repetitions of the same subsequence like  C- O - N -F-R- O - N -T , otherwise it is a  simple sequence Research Issue: In complex sequences the correct sequence component  can only be retrieved by knowing components prior to the current one.  How can this be done automatically in an unsupervised neural network  framework?
Issues in Multiple Sequence Processing Built for Single Sequence Processing Only: Most existing sequential neural networks have no inbuilt mechanism to distinguish between multiple sequences.  Research Issue:  How can a sequential neural network learn multiple sequences one after the other without catastrophic interference – the  phenomenon whereby the most recently learned ones erase the  previously learnt sequences .
The Hypermap Model  (Kohonen, 1991) Key Features a self-organising map patterns occur in  the context of other patterns.  the most recent elements prior to an element, or  some  processed form of them, used as context.  Shortcomings Can not recall a sequence using time-varying context Can not handle multiple sequences Context domain selected using input context vector Best match from within the selected context domain picked using input pattern vector
Barreto-Araujo Extended Hypermap Model(1) a(t-1), y(t-1) a(t), y(t) Lateral Weights  M Feedforward Weights  W z -1 z -1 z -1 z -1 z --1 context Sensory  stimuli
Barreto-Araujo Extended Hypermap Model(1) Key Features: Lateral weights encode the temporal order of  the sequences.  The context weights encode sequence identity Sensor weights encode  sequence item value  Successes: Can recall entire sequences in correct temporal order  Can handle multiple sequence processing (Up to a point) Shortcomings: Model can only handle sequences with no repeating elements.
Barreto-Araujo Extended Hypermap Model(2) z -1 z -1 z -1 z -1 Fixed  context Time-varying context Sensorimotor stimuli  a(t-1), y(t-1) a(t), y(t) Lateral Weights  M Feedforward Weights  W z --1
Barreto-Araujo Extended Hypermap Model(2) Key Features: Time-varying context vector act as element ID within a sequence Fixed context vector to act as sequence identity vector Successes: Can handle both complex and multiple sequences  Shortcomings: No mechanism to identify, anticipate and recall sequences using partial sequence data Contextual processing of sequences rather limited
The Hypermap Model for Multiple Sequence Processing Have Modified the Barreto-Araujo Model as Follows: Incorporated a  short term memory mechanism  to dynamically  encode the time-varying context of each sequence item,  making it possible to recall a stored sequence from its constituent subsequences.  Incorporated inhibitory links to enable  competitive queuing during context dependent recall of sequences.
Temporal Hypermap Neuron D j 0 D j 1 D j 2 D j (d-1 ) D j+1 0 D j+1 1 D j+2 0 D j+1 (d-2) D j+2 (d-3) D j+d-1 0 D j+d-2 1 (j-1) th  Neuron ( j+1) th  Neuron Pattern Vector Context Vector Threshold unit Delay units   Hebbian Link Hebbian Link Inhibitory Links
The  Competitive Queuing  Scheme for Context- Based Recall Have Modified the Barreto-Araujo Model as Follows: Incorporated a  short term memory mechanism  to dynamically  encode the time-varying context of each sequence item,  making it possible to recall a stored sequence from its constituent subsequences.  Incorporated inhibitory links to enable  competitive queuing during context dependent recall of sequences.
The  Competitive Queuing  Scheme for  Context- Based Recall Context vector applied to the network and Winner Take All mechanism activates all the target sequence neurons  Inhibitory links ensure that only the first neuron is free to fire. Next sequence neuron fires on deactivation of first neuron  Neuron activation and deactivation continues until entire sequence is retrieved  Scheme first proposed by Estes [17], and used by Rumelhart and Norman [18] in their model of how skilled typists generate  transposition errors.
The  Short Term Mechanism  for Sequence Item Identification and Recall Pattern input Winning Neuron applies a pulse to its tapped delay line.  Each tap on a delay line feeds into a threshold logic unit.  The inputs to each threshold logic unit are the output of the tap position to which it is connected on its neuron’s delay line as well as all the simultaneously active tap positions on later neurons in the sequence.  Threshold unit activation levels and output activations dependent on tap position and WTA ensures highest level threshold logic  unit wins the competition.  STM mechanism transforms network neurons into subsequence detectors which fire when associated subsequence entered into network, one item at a time.
Experimental Evaluation Evaluation Criteria Sought to evaluate the network’s ability to store and recall Complex sequences Handle multiple sequences with high degree of overlap Sought to compare performance with other models using Publicly available benchmark data set
Experimental Evaluation 1: Evaluation Data Artificially Intelligent Neural Networks II  6 Supervised Learning  11 Probabilistic Neural Networks and  Radial Basis Functions  5 Applications of Neural Networks to Power  Systems  10 Hybrid Systems III 4 Image Processing  9 Pattern Recognition II  3 Neural Systems Hardware  8 Intelligent Control II  2 Time Series Prediction  7 Learning and Memory II  1 Sequence No Sequence No
Network correctly recalls sequences  through Context  and when  partial sequences  applied to the network Hybrid Systems III Hybrid  Artificially Intelligent Neural Networks II Artificially cognition II cog Neural Networks and Radial Basis Functions Neural Networks and No CHOICE due to conflict between  sequences 2 and 6 Intelligent Series Prediction Series Time Series Prediction Time Processing Proc No CHOICE due to conflict between  sequences 5 and 10 Pro Radial Basis Functions Radial  Learning and Memory I1  Learning and Recalled Sequence Partial Sequence
Case Study: Two-Word Child Language “ there cookie”  instead of   “there is a cookie” “ more juice”  instead of   “Can I have some more juice” “ baby gone”  instead of   “The baby has disappeared”
Two-Word Model Assumptions Ordering of two-word utterances  not random  but similar to adult speech word order  (Gleitman et al, 1995) Two-word stage and one-word stage communicative intentions similar: e.g. naming, pointing out, state ownership, comment on actions etc  (Brown, 1973) Leading us to the following two word stage Modelling Assumptions: a multimodal speech environment as in one-word stage consistent word ordering for each two-word utterance
Two-Word Model Simulation Model based on a Temporal Hypermap with:  Tri-modal Neurons with weights for word utterances, perceptual entities, and conceptual relations  Data:  25 two-word child utterances from the Bloom’73 corpus
Perceptual  Entity  Vector Word Vector Conceptual  Relation  Vector Inhibitory Link Z -1 Z -1 (j-1) th  Neuron j th   Neuron (j+1) th  Neuron Threshold Logic Unit Delay Line Element Temporal Hypermap Segment
Discussion: Two-Word Model Temporal Hypermap encodes utterances as two-word sequences  Utterance can be recalled by inputting a perceptual entity and conceptual relationship Network capable of utterance completion – Entering a unique first word leads to generation of entire two  word utterance
Simulating Transition from One-Word to Two-Word Speech From Saying “ cookie” “ more” “ down” To Saying: “ there cookie” “ more juice” “ sit down”
One-Word to Two-Word Transition Model Assumptions From 18 th  month to 24 th  month child language undergoes gradual and steady transition from one-word to two-word speech (Ingram,1981; Flavell 1971) Transition is  gradual,   continuous  and has  no precise start or end point  (Tomasell & Kruger, 1992) For the transition we assume for each communicative intention: Transition probability increases with exposure (training) Transition is non-reversible transition
Gated Multi-net Simulation of  One-Word to Two-Word Transition Gated Multinet Architecture comprises:  a modified counterpropagation network for one word utterances  a Temporal Hypermap for two-word sequences Exposure-dependent inhibitory links from Temporal Hypermap units to counterpropagation network to manage transition from one-word to two-word output Data:  15 corresponding pairs of one-word and two-word child utterances from the Bloom’73 corpus
One-Word to Two-Word Model Static CP network Temporal Hypermap Inhibitory Link Word vector Perceptual entity vector Conceptual  Relationship vector z -1 z -1
Output Two-Word Utterances Plotted Against Number of Training Cycles
Future Work: Majority of multimodal models of early child language are static: Image (input) Image (output) Label (output) Label (input) Plunkett et al model, 1992   perceptual input Conceptual  Relation input One Word Output Nyamapfene & Ahmad, 2007
Spoken words can be viewed as phoneme sequences and/or syllable sequences Motherese directed at preverbal infants relies heavily on the exaggerated emphasis of temporal synchrony between gesture/ongoing actions and speech  (Messer, 1978; Gogate et al, 2000; Zukow-Goldring, 2006)  But in, reality, the early child language environment is both multimodal and temporal: So we intend to model early child language as comprising phonological word forms and perceptual inputs as temporal sequences
We Will Make These Modifications D j 0 D j 1 D j 2 (j-1) th  Neuron ( j+1) th  Neuron Pattern Vector Context Vector Threshold unit Delay units   Hebbian Link Hebbian Link To include Pattern Items  from other  sequences 1 To Include Feedback from Concurrent  Sequences 2
Thank You Discussion and Questions ??!!

Temporal Hypermap Theory and Application

  • 1.
    A Hypermap Modelfor Multiple Sequence Processing Abel Nyamapfene 30 April 2007
  • 2.
    Research Motivation Iam investigating complex sequence processing and Multiple Sequence Processing using an Unsupervised Neural Network processing paradigm based on the Hypermap Model by Kohonen
  • 3.
    What is ASequence? A sequence is defined as a finite set of pattern items: S: s 1 – s 2 - … - s n where s j : j = 1 , …, n is a component of the sequence and the length of the sequence is n . Examples: In Language Processing Speech utterances Action sequences Gestural sequences In Multimedia Processing Video sequences speech
  • 4.
    Why Unsupervised Processing? Distributed sequence processing prone to catastrophic interference Requirement of teaching signal inappropriate for unsupervised processing applications
  • 5.
    Issues in ComplexSequence Processing Definition: A sequence is complex if it contains repetitions of the same subsequence like C- O - N -F-R- O - N -T , otherwise it is a simple sequence Research Issue: In complex sequences the correct sequence component can only be retrieved by knowing components prior to the current one. How can this be done automatically in an unsupervised neural network framework?
  • 6.
    Issues in MultipleSequence Processing Built for Single Sequence Processing Only: Most existing sequential neural networks have no inbuilt mechanism to distinguish between multiple sequences. Research Issue: How can a sequential neural network learn multiple sequences one after the other without catastrophic interference – the phenomenon whereby the most recently learned ones erase the previously learnt sequences .
  • 7.
    The Hypermap Model (Kohonen, 1991) Key Features a self-organising map patterns occur in the context of other patterns. the most recent elements prior to an element, or some processed form of them, used as context. Shortcomings Can not recall a sequence using time-varying context Can not handle multiple sequences Context domain selected using input context vector Best match from within the selected context domain picked using input pattern vector
  • 8.
    Barreto-Araujo Extended HypermapModel(1) a(t-1), y(t-1) a(t), y(t) Lateral Weights M Feedforward Weights W z -1 z -1 z -1 z -1 z --1 context Sensory stimuli
  • 9.
    Barreto-Araujo Extended HypermapModel(1) Key Features: Lateral weights encode the temporal order of the sequences. The context weights encode sequence identity Sensor weights encode sequence item value Successes: Can recall entire sequences in correct temporal order Can handle multiple sequence processing (Up to a point) Shortcomings: Model can only handle sequences with no repeating elements.
  • 10.
    Barreto-Araujo Extended HypermapModel(2) z -1 z -1 z -1 z -1 Fixed context Time-varying context Sensorimotor stimuli a(t-1), y(t-1) a(t), y(t) Lateral Weights M Feedforward Weights W z --1
  • 11.
    Barreto-Araujo Extended HypermapModel(2) Key Features: Time-varying context vector act as element ID within a sequence Fixed context vector to act as sequence identity vector Successes: Can handle both complex and multiple sequences Shortcomings: No mechanism to identify, anticipate and recall sequences using partial sequence data Contextual processing of sequences rather limited
  • 12.
    The Hypermap Modelfor Multiple Sequence Processing Have Modified the Barreto-Araujo Model as Follows: Incorporated a short term memory mechanism to dynamically encode the time-varying context of each sequence item, making it possible to recall a stored sequence from its constituent subsequences. Incorporated inhibitory links to enable competitive queuing during context dependent recall of sequences.
  • 13.
    Temporal Hypermap NeuronD j 0 D j 1 D j 2 D j (d-1 ) D j+1 0 D j+1 1 D j+2 0 D j+1 (d-2) D j+2 (d-3) D j+d-1 0 D j+d-2 1 (j-1) th Neuron ( j+1) th Neuron Pattern Vector Context Vector Threshold unit Delay units Hebbian Link Hebbian Link Inhibitory Links
  • 14.
    The CompetitiveQueuing Scheme for Context- Based Recall Have Modified the Barreto-Araujo Model as Follows: Incorporated a short term memory mechanism to dynamically encode the time-varying context of each sequence item, making it possible to recall a stored sequence from its constituent subsequences. Incorporated inhibitory links to enable competitive queuing during context dependent recall of sequences.
  • 15.
    The CompetitiveQueuing Scheme for Context- Based Recall Context vector applied to the network and Winner Take All mechanism activates all the target sequence neurons Inhibitory links ensure that only the first neuron is free to fire. Next sequence neuron fires on deactivation of first neuron Neuron activation and deactivation continues until entire sequence is retrieved Scheme first proposed by Estes [17], and used by Rumelhart and Norman [18] in their model of how skilled typists generate transposition errors.
  • 16.
    The ShortTerm Mechanism for Sequence Item Identification and Recall Pattern input Winning Neuron applies a pulse to its tapped delay line. Each tap on a delay line feeds into a threshold logic unit. The inputs to each threshold logic unit are the output of the tap position to which it is connected on its neuron’s delay line as well as all the simultaneously active tap positions on later neurons in the sequence. Threshold unit activation levels and output activations dependent on tap position and WTA ensures highest level threshold logic unit wins the competition. STM mechanism transforms network neurons into subsequence detectors which fire when associated subsequence entered into network, one item at a time.
  • 17.
    Experimental Evaluation EvaluationCriteria Sought to evaluate the network’s ability to store and recall Complex sequences Handle multiple sequences with high degree of overlap Sought to compare performance with other models using Publicly available benchmark data set
  • 18.
    Experimental Evaluation 1:Evaluation Data Artificially Intelligent Neural Networks II 6 Supervised Learning 11 Probabilistic Neural Networks and Radial Basis Functions 5 Applications of Neural Networks to Power Systems 10 Hybrid Systems III 4 Image Processing 9 Pattern Recognition II 3 Neural Systems Hardware 8 Intelligent Control II 2 Time Series Prediction 7 Learning and Memory II 1 Sequence No Sequence No
  • 19.
    Network correctly recallssequences through Context and when partial sequences applied to the network Hybrid Systems III Hybrid Artificially Intelligent Neural Networks II Artificially cognition II cog Neural Networks and Radial Basis Functions Neural Networks and No CHOICE due to conflict between sequences 2 and 6 Intelligent Series Prediction Series Time Series Prediction Time Processing Proc No CHOICE due to conflict between sequences 5 and 10 Pro Radial Basis Functions Radial Learning and Memory I1 Learning and Recalled Sequence Partial Sequence
  • 20.
    Case Study: Two-WordChild Language “ there cookie” instead of “there is a cookie” “ more juice” instead of “Can I have some more juice” “ baby gone” instead of “The baby has disappeared”
  • 21.
    Two-Word Model AssumptionsOrdering of two-word utterances not random but similar to adult speech word order (Gleitman et al, 1995) Two-word stage and one-word stage communicative intentions similar: e.g. naming, pointing out, state ownership, comment on actions etc (Brown, 1973) Leading us to the following two word stage Modelling Assumptions: a multimodal speech environment as in one-word stage consistent word ordering for each two-word utterance
  • 22.
    Two-Word Model SimulationModel based on a Temporal Hypermap with: Tri-modal Neurons with weights for word utterances, perceptual entities, and conceptual relations Data: 25 two-word child utterances from the Bloom’73 corpus
  • 23.
    Perceptual Entity Vector Word Vector Conceptual Relation Vector Inhibitory Link Z -1 Z -1 (j-1) th Neuron j th Neuron (j+1) th Neuron Threshold Logic Unit Delay Line Element Temporal Hypermap Segment
  • 24.
    Discussion: Two-Word ModelTemporal Hypermap encodes utterances as two-word sequences Utterance can be recalled by inputting a perceptual entity and conceptual relationship Network capable of utterance completion – Entering a unique first word leads to generation of entire two word utterance
  • 25.
    Simulating Transition fromOne-Word to Two-Word Speech From Saying “ cookie” “ more” “ down” To Saying: “ there cookie” “ more juice” “ sit down”
  • 26.
    One-Word to Two-WordTransition Model Assumptions From 18 th month to 24 th month child language undergoes gradual and steady transition from one-word to two-word speech (Ingram,1981; Flavell 1971) Transition is gradual, continuous and has no precise start or end point (Tomasell & Kruger, 1992) For the transition we assume for each communicative intention: Transition probability increases with exposure (training) Transition is non-reversible transition
  • 27.
    Gated Multi-net Simulationof One-Word to Two-Word Transition Gated Multinet Architecture comprises: a modified counterpropagation network for one word utterances a Temporal Hypermap for two-word sequences Exposure-dependent inhibitory links from Temporal Hypermap units to counterpropagation network to manage transition from one-word to two-word output Data: 15 corresponding pairs of one-word and two-word child utterances from the Bloom’73 corpus
  • 28.
    One-Word to Two-WordModel Static CP network Temporal Hypermap Inhibitory Link Word vector Perceptual entity vector Conceptual Relationship vector z -1 z -1
  • 29.
    Output Two-Word UtterancesPlotted Against Number of Training Cycles
  • 30.
    Future Work: Majorityof multimodal models of early child language are static: Image (input) Image (output) Label (output) Label (input) Plunkett et al model, 1992 perceptual input Conceptual Relation input One Word Output Nyamapfene & Ahmad, 2007
  • 31.
    Spoken words canbe viewed as phoneme sequences and/or syllable sequences Motherese directed at preverbal infants relies heavily on the exaggerated emphasis of temporal synchrony between gesture/ongoing actions and speech (Messer, 1978; Gogate et al, 2000; Zukow-Goldring, 2006) But in, reality, the early child language environment is both multimodal and temporal: So we intend to model early child language as comprising phonological word forms and perceptual inputs as temporal sequences
  • 32.
    We Will MakeThese Modifications D j 0 D j 1 D j 2 (j-1) th Neuron ( j+1) th Neuron Pattern Vector Context Vector Threshold unit Delay units Hebbian Link Hebbian Link To include Pattern Items from other sequences 1 To Include Feedback from Concurrent Sequences 2
  • 33.
    Thank You Discussionand Questions ??!!