Successfully reported this slideshow.
Upcoming SlideShare
×

# From RNN to neural networks for cyclic undirected graphs

16 views

Published on

Working Group on Graph Neural Network
May 7, 2020
visio

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### From RNN to neural networks for cyclic undirected graphs

1. 1. From RNN to neural networks for cyclicFrom RNN to neural networks for cyclic undirected graphsundirected graphs Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT WG GNN, May 7th, 2020WG GNN, May 7th, 2020 1 / 321 / 32
2. 2. To start: a briefoverviewofthis talk...To start: a briefoverviewofthis talk... 2 / 322 / 32
3. 3. Topic (What is this presentation about?) How to use (deep) NN for processing relational (graph) data? I will first start by describing Recurrent Neural Network (RNN) and their limits to process graphs Then, I will present two alternatives able to address these limits I will try to stay non technical (so potentially too vague) to focus on the most important take home messages 3 / 32
4. 4. Description ofthe purpose ofthe methods Data: a graph , with a set of vertices with labels and a set of edges that can also be labelled The graph can be directed or undirected, with or without cycles. Purpose: find a method (a neural network with weights , ) that is able to process these data (using the information about the relations / edges between vertices) to obtain: make a prediction for every node in the graph, or make a prediction for the graph itself, learning dataset: a collection of graphs or a graph (that can be disconnected) associated to predictions or G V = {x1 , . . . xn} l(xi ) ∈ R p E = (ej)j=1,…,m l(ej) ∈ R q w ϕw ϕw(xi ) ϕw(G) y(G) yi 4 / 32
5. 5. The basis ofthe work: RNN for structured dataThe basis ofthe work: RNN for structured data 5 / 325 / 32
6. 6. Framework Reference: [Sperduti & Starita, 1997] basic description of standard RNN adaptations to deal with directed acyclic graphs (DAG) output is obtained at the graph level ( ) The article also mentions way to deal with cycles and other types of learning that the standard back-propagation that I'll describe ϕw(G) 6 / 32
7. 7. Fromstandard neuron to recurrent neuron standard neuron where are the inputs of the neuron (often: the neurons in the previous layer). o = f ( r ∑ j=1 wjvj) vj 7 / 32
8. 8. Fromstandard neuron to recurrent neuron recurrent neuron where is the self weight. o(t) = f ( r ∑ j=1 wjvj + wSo(t − 1)) wS 8 / 32
9. 9. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) where is the position of the vertex within the children of (it means that the DAG is a positional DAG). x5 o(xi ) = f (∑ p j=1 wjlj(xi ) + ∑ xi→x i ′ ^wn(i ′ ) o(xi ′ )) n(i ′ ) i ′ xi 9 / 32
10. 10. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: (and similarly for and ) x5 o(x8 ) = f (∑ p j=1 wjlj(x8 )) x7 x2 10 / 32
11. 11. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: (and similarly for ) x5 o(x9 ) = f (∑ p j=1 wjlj(x9 ) + ^w1 o(x8 )) x3 11 / 32
12. 12. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: x5 o(x10 ) = f (∑ p j=1 wjlj(x10 ) + ^w1 o(x3 )) 12 / 32
13. 13. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: x5 o(x11 ) = f (∑ p j=1 wjlj(x11 ) + ^w1 o(x2 ) + ^w2 o(x7 ) + ^w3 o(x9 ) + ^w4 o(x10 )) 13 / 32
14. 14. Using that type ofrecurrent neuron for DAGencoding (for a DAG with a supersource, here ) We have: x5 o(x5 ) = f (∑ p j=1 wjlj(x5 ) + ^w1 o(x11 )) 14 / 32
15. 15. Using that type ofrecurrent neuron for DAGencoding Learning can be performed by back-propagation: for a given set of weights , recursively compute the outputs on the graph structure reciprocally, compute the gradient from the output, recursively on the graph structure (w, ^w) 15 / 32
16. 16. Generalization: cascade correlation for networks Idea: make several layer of outputs such that depends on , (as for the previous case) and also on (but these values are "frozen"). o 1 (x), … , o r (x) o l (x) l(x) (o l ′ (x ′ ))x→x ′ , l ′ ≤l (o l ′ (x))l ′ <l 16 / 32
17. 17. Main limits Since the approach explicitely relies on the DAG order to successively compute the output of the nodes, it is not adapted to undirected or cyclic graphs Also, the positional assumption of the neighbor of a given node (that an objective "order" exist between neighbors) is not easily met in real-world applications Can only compute prediction for graphs (not for nodes) Note: The method is tested (in this paper) on logic problems (not described here) 17 / 32
18. 18. A rst approach using contraction maps by ScarselliA rst approach using contraction maps by Scarselli etet al.al., 2009, 2009 18 / 3218 / 32
19. 19. Overviewofthe method is able to deal with undirected and cyclic graphs does not require a positional assumption on the neighbors of a given node can be used to make a prediction at a graph and node levels Main idea: use a "time"-dependant update of the neurons and use restriction on the weights to constrain the NN to be a contraction map so that the fixed point theorem can be applied 19 / 32
20. 20. For each node , we define: a neuron value expressed as: an output value obtained from this neuron value as: (that can be combined into a graph output value if needed) Basic neuron equations xi vi = fw (l(xi ), {l(xi , xu )}xu ∈N (xi) , {vu }xu ∈N (xi) , {l(xu )}xu ∈N (xi) ) oi = gw(vi , l(xi )) 20 / 32
21. 21. For each node , we define: a neuron value expressed as: an output value obtained from this neuron value as: (that can be combined into a graph output value if needed) In a compressed version, this gives: and . Basic neuron equations xi vi = fw (l(xi ), {l(xi , xu )}xu ∈N (xi) , {vu }xu ∈N (xi) , {l(xu )}xu ∈N (xi) ) oi = gw(vi , l(xi )) V = Fw(V , l) O = Gw(V , l) 21 / 32
22. 22. Making the process recurrent... The neuron value is made "time" dependent with: Equivalently, ) so, provided that is a contraction map, converges to a fixed point. (a sufficient condition is that the norm of is bounded by ) v t i = fw (l(xi ), {l(xi , xu )}xu ∈N (xi) , {v t−1 u }xu ∈N (xi) , {l(xu )}xu ∈N (xi) ) V t+1 = Fw(V t , l) Fw (V t )t ∇V Fw(V , l) μ < 1 22 / 32
23. 23. What are and ? is a fully-connected MLP is decomposed into and is trained as a 1 hidden layer MLP. Rk: another version is provided in which is obtained as a linear function in which the intercept and the slope are estimated by MLP. fw gw gw vi = fw(l(xi ), . . . ) ∑ xu ∈N (xi) hw(l(xi ), l(xi , xu ), vu , l(xu )) hw hw 23 / 32
24. 24. Training ofthe weights The weights of the two MLP are trained by the minimization of but, to ensure that the resulting is a contraction map, the weights of are penalized during the training: with for a given and . The training is performed by gradient descent where the gradient is obtained by back-propagation. BP is simplified using the fact that tends to a fixed point. ∑ n i=1 (yi − gw(v T i )) 2 Fw Fw n ∑ i=1 (yi − gw(v T i )) 2 + βL (|∇V Fw|) L(u) = u − μ μ ∈]0, β > 0 (v t i )t 24 / 32
25. 25. Applications The method is illustrated on different types of problems: the subgraph matching problem (finding a subgraph matching a target graph in a large graph) in which the prediction is made at the node level (does it belong to the subgraph or not?) recover the mutagenic compounds into nitroaromatic compounds (molecules used as intermediate subproducts in many industrial reactions). Compounds are described by the graph molecule with (qualitative and numerical) informations attached to the nodes web page ranking in which the purpose is to predict a Google page rank derived measure from a network of 5,000 web pages 25 / 32
26. 26. A second approach using constructive architecture byA second approach using constructive architecture by MicheliMicheli etal.etal., 2009, 2009 26 / 3226 / 32
27. 27. Overviewofthe method is able to deal with undirected and cyclic graphs (but no label on the edges) does not require a positional assumption on the neighbors of a given node can be used to make a prediction at a graph and node levels (probably, though it is made explicit for the graph level) Main idea: define an architecture close to "cascade correlation network" with some "frozen" neurones that are not updated. The architecture is hierarchical and adaptive, in the sense that it stops growing when a given accuracy is achieved. 27 / 32
28. 28. Neuron equations Similarly as previously, neurons are computed in a recurrent way that depends on "time". The neuron state at time for vertex depends on its label and of the neuron state of the neighboring neurons at all past times: Rk: a stationnary assumption (the weights do not depend on the node nor on the edge) is critical to obtain a simple enough formulation contrary to RNN or to the previous version, are not updated: the layer are trained one at a time and once the training is finished, the neuron states are considered "frozen" (which is a way to avoid problem with cycles) t xi v t i = f (∑ j w t j lj(xi ) + ∑ t ′ <t ^w tt ′ ∑ xu ∈N (xi) v t ′ u ) (v t ′ u )t ′ <t 28 / 32
29. 29. Combining neuron outputs into a prediction output of layer : where is a normalization factor (equal to 1 or to the number of nodes for instance) output of the network: t ϕ t w(G) = ∑ n i=1 v t i 1 C C Φw(G) = f (∑ t w t ϕ t w(G)) 29 / 32
30. 30. Training Training is also performed by minimization of the squared error but: not constraint is needed on weights back-propagation is not performed through unfolded layers Examples QSPR/QSAR task that consists in transforming information on molecular structure into information on chemical properties. Here: prediction of boiling point value classification of cyclic/acyclic graphs 30 / 32
31. 31. That's all for now...That's all for now... ... questions?... questions? 31 / 3231 / 32
32. 32. References Micheli A (2009) Neural networks for graphs: a contextual constructive approach. IEEE Transactions on Neural Networks, 20(3): 498-511 Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61-80 Sperduti A, Starita A (1997) Supervised neural network for the classification of structures. IEEE Transactions on Neural Networks, 8(3): 714-735 32 / 32