Relational learning, predicting properties of dyads, can be seen as an umbrella embodying machine learning problems such as matrix completion, multi-task learning, transfer learning, network prediction and zero-shot learning. Kronecker kernels-based learning algorithms represent a dyad as a structured object and thus provide a computationally efficient and theoretically well-founded framework to tackle these problems. As an alternative to this pairwise feature representation, a two-step approach was suggested that sequentially combines the knowledge from the two domains. This new stepwise method allows us to construct a novel algorithm for dealing with very large datasets in an online fashion. We illustrate experimentally that our method can not only improve performance of a very large-scale multi-class classification, but can also generalize to completely new classes.
A two-step method for large output spaces: pairwise learning and cross-validation shortcuts
1. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
A two-step method to incorporate task features
for large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard
De Baets1 & Willem Waegeman1
1KERMIT
Department of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer Science
University of Turku
NIPS: extreme classification workshop
December 12, 2015
2. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne
3. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
5. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
6. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
7. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
2.4
8. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning relations
Setting A
Setting B
Setting C
Setting D
Training
In-sample
tasks
Out-of-sample
tasks
Out-of-
sample
instances
In-sample
instances
9. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: drug design
Predicting interaction between proteins and small compounds
10. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: social network analysis
Predicting links between people
11. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: food pairing
Finding ingredients that pair well
12. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)
13. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
⌦
⌦ =
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)
14. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
⌦
⌦ =
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)
Pairwise prediction function: f (d, t) = w (φ(d) ⊗ ψ(t))
15. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning relations in two steps
In-sample
tasks
Out-of-sample
tasks
Task KRR
Instance
KRR
Virtual instances
In-sample
instances
Out-of-
sample
instances
1 Build a ridge
regression model to
generalize to new
instances
2 Build a ridge
regression model to
generalize to new
tasks
16. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
The two-step ridge regression
Prediction function:
f (d, t) = φ(d) Wψ(t)
Parameters can be found by solving:
Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)
17. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
The two-step ridge regression
Prediction function:
f (d, t) = φ(d) Wψ(t)
Parameters can be found by solving:
Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)
Two hyperparameters: λd and λt!
18. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting C
Train
Test
Discarded
19. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting C
Train
Test
Discarded
Analytic shortcuts
can be derived to
perform LOOCV for
each setting!
Tuning λd and λt
essentially free!
20. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Effect of regularization for the four settings
Data: protein-ligand interactions.
Evaluation by AUC (lighter = better performance)
lambda drugs
lambdatargets
0.550
0.600
0.6500.700
0.750
0.800
0.800
0.800
0.850nr data
CV for Setting A
lambda drugs
lambdatargets
0.560
0.6000.640
0.680
0.720
0.760
nr data
CV for Setting B
lambda drugs
lambdatargets
0.790
0.800
0.800
0.810
0.810
0.820
0.820
0.830
0.830
0.840
0.850
nr data
CV for Setting C
lambda drugs
lambdatargets
0.600
0.625
0.625
0.650
0.6750.700
0.725
nr data
CV for Setting D
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Clear difference between four settings and λd and λt!
21. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with mini-batches
Initial
training data
New training
instances
Even more training
instances
Tasks
Instances
Newtrainingtasks
22. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with mini-batches
Initial
training data
New training
instances
Even more training
instances
Tasks
Instances
Newtrainingtasks
Exact updating of the
parameters when new
training instances and/or
taks become available
scalable for “Big
Data” applications
updating model in
dynamic
environment
23. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Exact online learning for hierarchical text
classification
Hierarchical text classification (> 12, 000 labels): from 5,000
to 350,000 instances in steps of 1,000 instances.
24. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code
25. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code
Theoretically well founded
26. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code
Theoretically well founded
Allows for nifty computational tricks
‘free’ tuning for the hyperparameters
‘free’ LOOCV for all four settings!
closed-form solution for updating with mini-batches
27. A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
A two-step method to incorporate task features
for large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard
De Baets1 & Willem Waegeman1
1KERMIT
Department of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer Science
University of Turku
NIPS: extreme classification workshop
December 12, 2015