A two-step method to incorporate task features for large output spaces

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
A two-step method to incorporate task features
for large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard
De Baets1 & Willem Waegeman1
1KERMIT
Department of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer Science
University of Turku
NIPS: extreme classiﬁcation workshop
December 12, 2015

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
5 4
1 4
4
2 1
4 3
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
2.4

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning relations
Setting A
Setting B
Setting C
Setting D
Training
In-sample
tasks
Out-of-sample
tasks
Out-of-
sample
instances
In-sample
instances

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: drug design
Predicting interaction between proteins and small compounds

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: social network analysis
Predicting links between people

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: food pairing
Finding ingredients that pair well

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Features
books
Features
readers
⌦
⌦ =
(e.g. genre)
social network)

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Features
books
Features
readers
⌦
⌦ =
(e.g. genre)
social network)
Pairwise prediction function: f (d, t) = w (φ(d) ⊗ ψ(t))

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning relations in two steps
In-sample
tasks
Out-of-sample
tasks
Task KRR
Instance
KRR
Virtual instances
In-sample
instances
Out-of-
sample
instances
1 Build a ridge
regression model to
generalize to new
instances
2 Build a ridge
regression model to
generalize to new
tasks

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
The two-step ridge regression
Prediction function:
f (d, t) = φ(d) Wψ(t)
Parameters can be found by solving:
Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
The two-step ridge regression
Prediction function:
f (d, t) = φ(d) Wψ(t)
Parameters can be found by solving:
Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)
Two hyperparameters: λd and λt!

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting C
Train
Test
Discarded

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting C
Train
Test
Discarded
Analytic shortcuts
can be derived to
perform LOOCV for
each setting!
Tuning λd and λt
essentially free!

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Eﬀect of regularization for the four settings
Data: protein-ligand interactions.
Evaluation by AUC (lighter = better performance)
lambda drugs
lambdatargets
0.550
0.600
0.6500.700
0.750
0.800
0.800
0.800
0.850nr data
CV for Setting A
lambda drugs
lambdatargets
0.560
0.6000.640
0.680
0.720
0.760
nr data
CV for Setting B
lambda drugs
lambdatargets
0.790
0.800
0.800
0.810
0.810
0.820
0.820
0.830
0.830
0.840
0.850
nr data
CV for Setting C
lambda drugs
lambdatargets
0.600
0.625
0.625
0.650
0.6750.700
0.725
nr data
CV for Setting D
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Clear diﬀerence between four settings and λd and λt!

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with mini-batches
Initial
training data
New training
instances
Even more training
instances
Tasks
Instances
Newtrainingtasks

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with mini-batches
Initial
training data
New training
instances
Even more training
instances
Tasks
Instances
Newtrainingtasks
Exact updating of the
parameters when new
training instances and/or
taks become available
scalable for “Big
Data” applications
updating model in
dynamic
environment

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Exact online learning for hierarchical text
classiﬁcation
Hierarchical text classiﬁcation (> 12, 000 labels): from 5,000
to 350,000 instances in steps of 1,000 instances.

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
in one line of code
Theoretically well founded

A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
in one line of code
Theoretically well founded
Allows for nifty computational tricks
‘free’ tuning for the hyperparameters
‘free’ LOOCV for all four settings!
closed-form solution for updating with mini-batches

A two-step method to incorporate task features for large output spaces

More Related Content

More from Michiel Stock

Recently uploaded

A two-step method to incorporate task features for large output spaces