Relational machine-learning

Relational Machine Learning
Applications and Models
Bhushan Kotnis
Heidelberg University

Table of contents
1. Introduction
2. Models
1

Networks and Graphs
• Social Networks : Link Prediction, Relevant Ads, Feed
Recommendation.
2

Networks and Graphs
Recommendation.
• Biological Networks: Gene Ontology, Protein Interaction
Networks, Cellular Networks.
2

Networks and Graphs
Recommendation.
• Financial Networks: Assessing risk and exposure, providing
information, detecting fraud
2

Networks and Graphs
Recommendation.
• Financial Networks: Assessing risk and exposure, providing
information, detecting fraud
• Knowledge Graphs: Background knowledge for AI, intelligent
search engines.
2

Social Networks
• Problem: Rank Ads/Feeds, suggest relevant articles.
3

Social Networks
• Users are connected to one another, share interests,
demographic data, news preferences.
3

Social Networks
• Users are connected to one another, share interests,
demographic data, news preferences.
• Linked Machine Learning problem: Predict ads, article
recommendation, feed, etc using a uniﬁed model. 3

Genetic Regulatory Network
• Genes Regulatory Network: Molecular interaction network,
Genes interacting with proteins and other molecules.
4

• Problem: Infer family, function of the Gene based on its
interactions. Mutations leading to diseases.
4

• Problem: Infer family, function of the Gene based on its
interactions. Mutations leading to diseases.
• Link prediction problem: Linked ML problem because a
prediction depends on other predictions.
4

Financial Networks
• Interconnected banks, companies, commodities, products,
events, people, locations.
5

Financial Networks
• Problem: Infer missing connections for estimating exposure.
5

Financial Networks
• Problem: Infer missing connections for estimating exposure.
• Problem: Reasoning using path correlations.
5

The KGC Problem
• Knowledge Graph : G set of triples (s, r, t), s, t ∈ E and r ∈ R.
7

The KGC Problem
• Ranking Problem: Query (s, r, ?) target set e1, e2, en. Rank targets
based on plausibility of relation r existing between s and ei.
7

The KGC Problem
• (Frankfurt, cityliesonriver, ?) Choices: Rhine, Mosel, Thames,
Main, Hudson.
7

The KGC Problem
Main, Hudson.
• (user_id_201345, user_prefers_genre, ?) Choices: Fiction,
Non-Fiction, Horror, Romance, Fantasy.
7

The KGC Problem
Main, Hudson.
• (user_id_201345, user_prefers_genre, ?) Choices: Fiction,
Non-Fiction, Horror, Romance, Fantasy.
• (TP53, disease, ?) Choices: none, Breast Cancer, Liver Cancer,
Lung Cancer.
7

Recommendation Engines
• Recommend Movies. ui: vector representing user i and vi
represents product i. u, v ∈ Rd
.
8

.
• Minimize
∑
i,j(ri,j − uT
i vj)2
+ Regularizer
8

.
• Minimize
∑
i,j(ri,j − uT
i vj)2
+ Regularizer
• If rating ri,j is very high then we want high similarity (dot
product) between user and product vectors.
8

.
• Minimize
∑
i,j(ri,j − uT
i vj)2
+ Regularizer
• These vectors are called latent factors. Not interpretable, could
be genre, topics, themes. Help generalization.
8

.
• Minimize
∑
i,j(ri,j − uT
i vj)2
+ Regularizer
• These vectors are called latent factors. Not interpretable, could
be genre, topics, themes. Help generalization.
• Initialize them randomly and learn using SGD. They capture the
structure of the matrix.
8

RESCAL Model
• Capture Graph structure. Graph has multiple relations: users ×
products, users × demographics, products × Categories.
9

RESCAL Model
• Solution: One matrix factorization problem for every relation.
9

RESCAL Model
• f (s, r, t) = xT
s Wr xt. Where (xs, xt) ∈ Rd
, Wr ∈ Rd×d
9

RESCAL Model
• f (s, r, t) = xT
s Wr xt. Where (xs, xt) ∈ Rd
, Wr ∈ Rd×d
• Max-Margin: max
[
0, 1 −
(
f(s, r, t) − f(s, r, t′
)
)
]
. Can also use
softmax, or l2 loss like collaborative ﬁltering.
9

Interpretation
score(s,r,t)
Figure 1: RESCAL as Neural Network: Three latent factors (d =3).
• xs ⊕ xT
t : all possible latent factor interactions (d × d) matrix.
Matrix Wr acts like a mask, boosting or suppressing pairwise
interactions.
10

Interpretation
score(s,r,t)
Figure 1: RESCAL as Neural Network: Three latent factors (d =3).
• xs ⊕ xT
t : all possible latent factor interactions (d × d) matrix.
Matrix Wr acts like a mask, boosting or suppressing pairwise
interactions.
• Entities appear in multiple relations as subjects or objects.
Information Sharing!
10

Billinear Diag. and TransE Model
• RESCAL [2]: Requires O(Ned + Nrd2
) parameters. Scalability
issues for large Nr.
11

• Bilinear Diag [4]: Enforce Wr to be a diagonal matrix. Assumes
symmetric relations. Why? Memory Complexity : O(Ned + Nrd)
11

• TransE [1] : f(s, r, t) = −||(xs + xr) − xt||2.
11

• TransE : Can it model all types of relations. Why?
11

• TransE : Can it model all types of relations. Why?
• Takeaway: Make sure parameters are shared. Either shared
representation or shared layer.
11

Negative Sampling
• How to generate negative samples? Negatives may not be
provided.
12

Negative Sampling
provided.
• Closed World Assumption: If not a positive then must be a
negative.
12

Negative Sampling
provided.
negative.
• Max-Margin: max
[
0, 1 −
(
f(s, r, t) − f(s, r, t′
)
)
]
. Softer negatives:
(s, r,′
t′
) more negative than (s, r, t)
12

Negative Sampling
provided.
negative.
• Max-Margin: max
[
0, 1 −
(
f(s, r, t) − f(s, r, t′
)
)
]
. Softer negatives:
(s, r,′
t′
• Soft Max Loss : log(1 + exp(−yif(si, ri, ti)). Negatives are ‘really’
negative.
12

Negative Sampling
provided.
negative.
• Max-Margin: max
[
0, 1 −
(
f(s, r, t) − f(s, r, t′
)
)
]
. Softer negatives:
(s, r,′
t′
• Soft Max Loss : log(1 + exp(−yif(si, ri, ti)). Negatives are ‘really’
negative.
• Number of negative samples during training affect performance.
See [3].
12

Deep Learning
countryofHQ
(target relation)
Similarity metric
0.94
d
Q
Microsoft
isBasedIn
Seattle
locatedIn
USA
(dummy_rel)
Washington
locatedIn
(Path Vector)
Figure 2: At each step, the RNN consumes both entity and relation vectors of the path. The
representation can be obtained from its types. The path vector yπ is the last hidden state. The para
of the RNN and relation embeddings are shared across all query relations. The dot product b
the ﬁnal representation of the path and the query relation gives a conﬁdence score, with higher
indicating that the query relation exists between the entity pair.
Figure 2: Source : Das et al. (2016). RNN generates a representation for the
path. Similarity between path representation and query relation indicates
whether the path supports the query.
13

Questions
I am convinced that the crux of the problem of learning is
recognizing relationships and being able to use them.
Christopher Strachey in a letter to Alan Turing, 1954.
14

References I
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and
O. Yakhnenko.
Translating embeddings for modeling multi-relational data.
In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q.
Weinberger, editors, Advances in Neural Information Processing
Systems 26, pages 2787–2795. Curran Associates, Inc., 2013.
M. Nickel, V. Tresp, and H.-P. Kriegel.
A three-way model for collective learning on multi-relational
data.
In ICML, 2011.
15

References II
T. Trouillon, C. R. Dance, J. Welbl, S. Riedel, É. Gaussier, and
G. Bouchard.
Knowledge graph completion via complex tensor factorization.
arXiv preprint arXiv:1702.06879, 2017.
B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng.
Embedding entities and relations for learning and inference in
knowledge bases.
arXiv preprint arXiv:1412.6575, 2014.
16

Relational machine-learning

More Related Content

What's hot

Similar to Relational machine-learning

Recently uploaded

Relational machine-learning