A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
A two-step method to incorporate task features
for large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard
De Baets1 & Willem Waegeman1
1KERMIT
Department of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer Science
University of Turku
NIPS: extreme classification workshop
December 12, 2015
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social
graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
2.4
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning relations
Setting A
Setting B
Setting C
Setting D
Training
In-sample
tasks
Out-of-sample
tasks
Out-of-
sample
instances
In-sample
instances
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: drug design
Predicting interaction between proteins and small compounds
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: social network analysis
Predicting links between people
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Other cool applications: food pairing
Finding ingredients that pair well
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
⌦
⌦ =
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with pairwise feature representations
Features
books
Features
readers
⌦
⌦ =
d : instance (e.g. book)
φ(d) : instance features
(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.
social network)
Pairwise prediction function: f (d, t) = w (φ(d) ⊗ ψ(t))
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning relations in two steps
In-sample
tasks
Out-of-sample
tasks
Task KRR
Instance
KRR
Virtual instances
In-sample
instances
Out-of-
sample
instances
1 Build a ridge
regression model to
generalize to new
instances
2 Build a ridge
regression model to
generalize to new
tasks
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
The two-step ridge regression
Prediction function:
f (d, t) = φ(d) Wψ(t)
Parameters can be found by solving:
Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
The two-step ridge regression
Prediction function:
f (d, t) = φ(d) Wψ(t)
Parameters can be found by solving:
Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)
Two hyperparameters: λd and λt!
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting C
Train
Test
Discarded
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting C
Train
Test
Discarded
Analytic shortcuts
can be derived to
perform LOOCV for
each setting!
Tuning λd and λt
essentially free!
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Effect of regularization for the four settings
Data: protein-ligand interactions.
Evaluation by AUC (lighter = better performance)
lambda drugs
lambdatargets
0.550
0.600
0.6500.700
0.750
0.800
0.800
0.800
0.850nr data
CV for Setting A
lambda drugs
lambdatargets
0.560
0.6000.640
0.680
0.720
0.760
nr data
CV for Setting B
lambda drugs
lambdatargets
0.790
0.800
0.800
0.810
0.810
0.820
0.820
0.830
0.830
0.840
0.850
nr data
CV for Setting C
lambda drugs
lambdatargets
0.600
0.625
0.625
0.650
0.6750.700
0.725
nr data
CV for Setting D
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Clear difference between four settings and λd and λt!
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with mini-batches
Initial
training data
New training
instances
Even more training
instances
Tasks
Instances
Newtrainingtasks
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Learning with mini-batches
Initial
training data
New training
instances
Even more training
instances
Tasks
Instances
Newtrainingtasks
Exact updating of the
parameters when new
training instances and/or
taks become available
scalable for “Big
Data” applications
updating model in
dynamic
environment
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Exact online learning for hierarchical text
classification
Hierarchical text classification (> 12, 000 labels): from 5,000
to 350,000 instances in steps of 1,000 instances.
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code
Theoretically well founded
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...
in one line of code
Theoretically well founded
Allows for nifty computational tricks
‘free’ tuning for the hyperparameters
‘free’ LOOCV for all four settings!
closed-form solution for updating with mini-batches
A two-step
method for
large output
spaces
Michiel Stock
twitter:
@michielstock
Motivation
Introductory
example
Relational
learning
Other
applications
Pairwise
learning
methods
Kronecker kernel
ridge regression
Two-step kernel
ridge regression
Computational
aspects
Cross-validation
Exact online
learning
Take home
messages
KERMIT
A two-step method to incorporate task features
for large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard
De Baets1 & Willem Waegeman1
1KERMIT
Department of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer Science
University of Turku
NIPS: extreme classification workshop
December 12, 2015

A two-step method to incorporate task features for large output spaces

  • 1.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT A two-step method to incorporate task features for large output spaces Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard De Baets1 & Willem Waegeman1 1KERMIT Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University 2Department of Computer Science University of Turku NIPS: extreme classification workshop December 12, 2015
  • 2.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT What will we read next? 5 4 1 4 4 2 1 4 3 Alice Bob Cedric Daphne
  • 3.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT What will we read next? 5 4 1 4 4 2 1 4 3 Alice Bob Cedric Daphne Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1
  • 4.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT What will we read next? 5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6 Alice Bob Cedric Daphne Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1
  • 5.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT What will we read next? 5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6 Alice Bob Cedric Daphne Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1 4.8 1.1 3.7 2.31 1 0 1
  • 6.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT What will we read next? 5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6 Alice Bob Cedric Daphne Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1 4.8 1.1 3.7 2.31 1 0 1 2.3 4.0 1.7 4.8 2.9 Eric
  • 7.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT What will we read next? 5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6 Alice Bob Cedric Daphne Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1 4.8 1.1 3.7 2.31 1 0 1 2.3 4.0 1.7 4.8 2.9 Eric 2.4
  • 8.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning relations Setting A Setting B Setting C Setting D Training In-sample tasks Out-of-sample tasks Out-of- sample instances In-sample instances
  • 9.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Other cool applications: drug design Predicting interaction between proteins and small compounds
  • 10.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Other cool applications: social network analysis Predicting links between people
  • 11.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Other cool applications: food pairing Finding ingredients that pair well
  • 12.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning with pairwise feature representations Features books Features readers d : instance (e.g. book) φ(d) : instance features (e.g. genre) t : task (e.g. reader) ψ(t) : task features (e.g. social network)
  • 13.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning with pairwise feature representations Features books Features readers ⌦ ⌦ = d : instance (e.g. book) φ(d) : instance features (e.g. genre) t : task (e.g. reader) ψ(t) : task features (e.g. social network)
  • 14.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning with pairwise feature representations Features books Features readers ⌦ ⌦ = d : instance (e.g. book) φ(d) : instance features (e.g. genre) t : task (e.g. reader) ψ(t) : task features (e.g. social network) Pairwise prediction function: f (d, t) = w (φ(d) ⊗ ψ(t))
  • 15.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning relations in two steps In-sample tasks Out-of-sample tasks Task KRR Instance KRR Virtual instances In-sample instances Out-of- sample instances 1 Build a ridge regression model to generalize to new instances 2 Build a ridge regression model to generalize to new tasks
  • 16.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT The two-step ridge regression Prediction function: f (d, t) = φ(d) Wψ(t) Parameters can be found by solving: Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI)
  • 17.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT The two-step ridge regression Prediction function: f (d, t) = φ(d) Wψ(t) Parameters can be found by solving: Φ YΨ = (Φ Φ + λd I)W(Ψ Ψ + λtI) Two hyperparameters: λd and λt!
  • 18.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Four ways of cross validation Setting A Setting B Setting DSetting C Train Test Discarded
  • 19.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Four ways of cross validation Setting A Setting B Setting DSetting C Train Test Discarded Analytic shortcuts can be derived to perform LOOCV for each setting! Tuning λd and λt essentially free!
  • 20.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Effect of regularization for the four settings Data: protein-ligand interactions. Evaluation by AUC (lighter = better performance) lambda drugs lambdatargets 0.550 0.600 0.6500.700 0.750 0.800 0.800 0.800 0.850nr data CV for Setting A lambda drugs lambdatargets 0.560 0.6000.640 0.680 0.720 0.760 nr data CV for Setting B lambda drugs lambdatargets 0.790 0.800 0.800 0.810 0.810 0.820 0.820 0.830 0.830 0.840 0.850 nr data CV for Setting C lambda drugs lambdatargets 0.600 0.625 0.625 0.650 0.6750.700 0.725 nr data CV for Setting D 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Clear difference between four settings and λd and λt!
  • 21.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning with mini-batches Initial training data New training instances Even more training instances Tasks Instances Newtrainingtasks
  • 22.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Learning with mini-batches Initial training data New training instances Even more training instances Tasks Instances Newtrainingtasks Exact updating of the parameters when new training instances and/or taks become available scalable for “Big Data” applications updating model in dynamic environment
  • 23.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Exact online learning for hierarchical text classification Hierarchical text classification (> 12, 000 labels): from 5,000 to 350,000 instances in steps of 1,000 instances.
  • 24.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Why two-step ridge regression? Zero-shot learning, transfer learning, multi-task learning... in one line of code
  • 25.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Why two-step ridge regression? Zero-shot learning, transfer learning, multi-task learning... in one line of code Theoretically well founded
  • 26.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT Why two-step ridge regression? Zero-shot learning, transfer learning, multi-task learning... in one line of code Theoretically well founded Allows for nifty computational tricks ‘free’ tuning for the hyperparameters ‘free’ LOOCV for all four settings! closed-form solution for updating with mini-batches
  • 27.
    A two-step method for largeoutput spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT A two-step method to incorporate task features for large output spaces Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard De Baets1 & Willem Waegeman1 1KERMIT Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University 2Department of Computer Science University of Turku NIPS: extreme classification workshop December 12, 2015