Learning Better while
Sending Less
Communication-Efficient Online Semi-Supervised
Learning in Client-Server Settings
Han Xiao Technical University of Munich
Shou-De Lin National Taiwan University
Mi-Yen Yeh Academia Sinica
Phillip B. Gibbons Intel Labs
Claudia Eckert Technical University of Munich
1
Project solves online semi-supervised learning in
client-server settings
}FrameworkClient Communication Server
Client
Communication
Server
Task Generate data and send to the server
Challenge Large volume of data and most are unlabeled
Task Transmit client and server data to each other
Challenge Network bandwidth is limited
Task Learn a classification model from client’s uploads
Challenge Incoming data stream is partially labeled
Framework Goal
Design a modular framework to provide high
classification accuracy with reduced communication
and labeling costs.
2
An intelligent traffic management system involves
distributed learning
Real-time traffic Surveillance
camera
Images captured
over time
9 am
Network
10 am
11 am
12 am
ServerClassifier
Automatic road
condition
recognition
Distributed learning in the
client-server setting
3
An intelligent wearable device involves distributed learning
Daily activities Wearable
device
Sensory data
in real-time
BluetoothLaptopClassifier
Human activity
recognition
Distributed learning in the
client-server setting
4
Outline
• Related work
• Gap & Question
• Method
• Result
• Summary
5
Project shares characteristics of online, semi-
supervised, and active learning
Online learning
Semi-supervised
learning
Active learning
• Passive aggressive [JMLR03],
• confidence weighted [ICML06,NIPS07],
• adaptive regularization of weights [NIPS09],
• exact soft confidence weight [ICML12]
• Semi-supervised suport vector machines,
• harmonic function solution [ICML03],
• SSL with max-margin graph cuts [AISTATS10]
• Submodularity function [AAAI07,ICRA10]
Online semi-
supervised learning
• Harmonic function on quantized graph [UAI09],
• bootstrap AROW [ACML12]
Online active
learning
• Unbiased online active learning [KDD11]
Active semi-
supervised learning
• Graph risk on harmonic function solution [ICML04]
online
active
semi-
supervised
1
1
2
3
4
5
6
2 3
4 5
6
7
Research Previous work
6
Project considers three settings jointly for
communication-efficient learning
Online learning
Semi-supervised
learning
Active learning
Online semi-
supervised learning
Online active
learning
Active semi-
supervised learning
online
active
semi-
supervised
1
1
2
3
4
5
6
2 3
4 5
6
7
Research Gap
Data is only partially
labeled
Data comes in
sequentially
No oracle is available
for providing feedback
Need to take
bandwidth limit in
account
Need to deal with
unlabeled data
Need to learn
incrementally
Project
7
7
Project develops algorithms for both client and
server
Unlabeled data Candidate pool Selection policy
Upload
selections
Two-learner
model
Update
selection policy
Client Server
Keysteps
1 Client fills unlabeled data into a candidate pool
2 Once the pool is full, client selects high-priority instances and uploads to server
3 Server receives unlabeled data and feeds to a two-learner model
4 Server updates the model and send the new selection policy to client
5 Client receives the new selection policy, clears the candidate pool, goto step 1
8
Server employs two-learner model to learn
unlabeled data from client
Unlabeled data Candidate pool Selection policy
Upload
selections
Two-learner
model
Update
selection policy
Client Server
PurposeMethod
• Incrementally learn a binary classifier from unlabeled data
Requirement
• Leverage neighbor information for exploiting unlabeled data
• Learn in online fashion
• Be efficient enough to handle large-volume of data
• Be easily parameterized as a selection policy
• Two-learner structure
• Harmonic solution (HS)
• Soft confidence-weighted (SCW)
HS SCW
Teach most
certain instances
Keysteps
9
Client uploads only crucial data according to the
selection policy
Unlabeled data
Keysteps
Candidate pool Selection policy
Upload
selections
Two-learner
model
Update
selection policy
Client Server
PurposeMethod
• Select a small set of data from the candidate pool for uploading
Requirement
• Uploaded data should improve the classification performance on the server
• Selection procedure should be light-weight for the client
• Selection policy should be light-weight for the network
• Use the current weight of SCW to construct the selection policy
• Optimize a submodular function consists of two criterions
• Uncertainty w.r.t. SCW
• Redundancy w.r.t. the candidate pool
10
Experiments validated algorithms on both server
and client
Goal
Data sets
Sessions
• Explore a good combination of techniques for
communication-efficient online semi-supervised
learning
• 10 data sets downloaded from UCI, LibSVM
website
• Benchmark the model on the server.
• Fix the labeling rate 2%, sampling rate 20%, and
selection policy to “rand” on the client.
• Benchmark the selection strategy on the client.
• Fix the labeling rate 2%, sampling rate 20%, and
server’s model to the best obtained in session 1.
• Explore how the labeling rate and sampling rate
affect the overall performance.
• Fix the server’s model to the best obtained in
session 1; fix the client’s policy to the best
obtained in session 2.
1
2
3
Evaluation • Offline accuracy on test set
Experiments
Two-learner model effectively learns from
unlabeled data
12
Full
HS+SCW+
CUT
HS+SCW
SCW
None
KNN+SCW
KNN
All uploaded instances are labeled by an oracle. This
approach should give the best result due to the availability
of full information.
Proposed two-learner model with cutoff averaging for
predicting test data.
Proposed two-learner model on the server.
The server consists of an SCW model only, which “learns”
each unlabeled instance using its own prediction.
No unlabeled instances are uploaded to the server. The
server stops learning right after labeled instances. This
approach should give the worst performance.
The server consists of a two-learner model: knn followed by
scw. The prediction of knn is used for training scw.
The server employs 5-nearest neighbors algorithm. The
training set is built by first including all labeled instances,
and then adding unlabeled instances with its corresponding
predicted labels.
Proposed two-learner model
Model on server Description
Acc. avg. on
10 data sets
92.16%
86.71%
86.38%
83.73%
84.55%
84.31%
82.89%
Result on
a data set
Better selection policy achieves higher accuracy with
same communication budget
13
Full
Submod
Uncertain
Rand
All
Certain
All uploaded instances are labeled by an oracle. This
approach should give the best result due to the availability
of full information.
Selection is done by optimizing a submodular function,
which considers both uncertainty and redundancy.
The most uncertain instances are uploaded.
Randomly selects instances for uploading.
All unlabeled instances are uploaded without selection. This
incurs 5x the communication costs versus other
approaches.
The most certain instances according to the current SCW
on the server are uploaded.
Selection policy
on client
Description
Acc. avg. on
10 data sets
92.16%
87.08%
87.12%
86.38%
86.32%
82.39%
None
The server employs 5-nearest neighbors algorithm. The
training set is built by first including all labeled instances,
and then adding unlabeled instances with its corresponding
predicted labels.
82.89%
Result on
a data set
Best combination of techniques reduces
communication cost while maintaining accuracy
14
}FrameworkClient Communication Server
Selection policy on
client
Labeling rate (a mount
of human effort)
Sampling rate (a mount
of communication cost)
Accuracy averaged on
10 data sets
Full 100% 20% 92.16%
All 2% 100% 86.32%
Rand 2% 20% 86.38%
Best comb. (submod) 2% 20% 87.08%
Unlabeled data Candidate pool Selection policy
Upload
selections
Two-learner
model
Update
selection policy
Client Server
Keysteps
Project establishes a framework that enables
communication-efficient learning in client-server settings
15
Client Server
Introduce a novel learning setting motivated by many big data
applications.
Propose a framework that is modular in design, flexible, and can be
practically incorporated into a variety of useful systems.
Present a novel techniques at the clients and the server that are
well-suited to providing high classification accuracy with reduced
communication and labeling costs.
Show that some particular combination of techniques outperforms
other approaches, and often outperforms (communication
expensive) approaches that send all the data to the server.

slides-sd

  • 1.
    Learning Better while SendingLess Communication-Efficient Online Semi-Supervised Learning in Client-Server Settings Han Xiao Technical University of Munich Shou-De Lin National Taiwan University Mi-Yen Yeh Academia Sinica Phillip B. Gibbons Intel Labs Claudia Eckert Technical University of Munich 1
  • 2.
    Project solves onlinesemi-supervised learning in client-server settings }FrameworkClient Communication Server Client Communication Server Task Generate data and send to the server Challenge Large volume of data and most are unlabeled Task Transmit client and server data to each other Challenge Network bandwidth is limited Task Learn a classification model from client’s uploads Challenge Incoming data stream is partially labeled Framework Goal Design a modular framework to provide high classification accuracy with reduced communication and labeling costs. 2
  • 3.
    An intelligent trafficmanagement system involves distributed learning Real-time traffic Surveillance camera Images captured over time 9 am Network 10 am 11 am 12 am ServerClassifier Automatic road condition recognition Distributed learning in the client-server setting 3
  • 4.
    An intelligent wearabledevice involves distributed learning Daily activities Wearable device Sensory data in real-time BluetoothLaptopClassifier Human activity recognition Distributed learning in the client-server setting 4
  • 5.
    Outline • Related work •Gap & Question • Method • Result • Summary 5
  • 6.
    Project shares characteristicsof online, semi- supervised, and active learning Online learning Semi-supervised learning Active learning • Passive aggressive [JMLR03], • confidence weighted [ICML06,NIPS07], • adaptive regularization of weights [NIPS09], • exact soft confidence weight [ICML12] • Semi-supervised suport vector machines, • harmonic function solution [ICML03], • SSL with max-margin graph cuts [AISTATS10] • Submodularity function [AAAI07,ICRA10] Online semi- supervised learning • Harmonic function on quantized graph [UAI09], • bootstrap AROW [ACML12] Online active learning • Unbiased online active learning [KDD11] Active semi- supervised learning • Graph risk on harmonic function solution [ICML04] online active semi- supervised 1 1 2 3 4 5 6 2 3 4 5 6 7 Research Previous work 6
  • 7.
    Project considers threesettings jointly for communication-efficient learning Online learning Semi-supervised learning Active learning Online semi- supervised learning Online active learning Active semi- supervised learning online active semi- supervised 1 1 2 3 4 5 6 2 3 4 5 6 7 Research Gap Data is only partially labeled Data comes in sequentially No oracle is available for providing feedback Need to take bandwidth limit in account Need to deal with unlabeled data Need to learn incrementally Project 7 7
  • 8.
    Project develops algorithmsfor both client and server Unlabeled data Candidate pool Selection policy Upload selections Two-learner model Update selection policy Client Server Keysteps 1 Client fills unlabeled data into a candidate pool 2 Once the pool is full, client selects high-priority instances and uploads to server 3 Server receives unlabeled data and feeds to a two-learner model 4 Server updates the model and send the new selection policy to client 5 Client receives the new selection policy, clears the candidate pool, goto step 1 8
  • 9.
    Server employs two-learnermodel to learn unlabeled data from client Unlabeled data Candidate pool Selection policy Upload selections Two-learner model Update selection policy Client Server PurposeMethod • Incrementally learn a binary classifier from unlabeled data Requirement • Leverage neighbor information for exploiting unlabeled data • Learn in online fashion • Be efficient enough to handle large-volume of data • Be easily parameterized as a selection policy • Two-learner structure • Harmonic solution (HS) • Soft confidence-weighted (SCW) HS SCW Teach most certain instances Keysteps 9
  • 10.
    Client uploads onlycrucial data according to the selection policy Unlabeled data Keysteps Candidate pool Selection policy Upload selections Two-learner model Update selection policy Client Server PurposeMethod • Select a small set of data from the candidate pool for uploading Requirement • Uploaded data should improve the classification performance on the server • Selection procedure should be light-weight for the client • Selection policy should be light-weight for the network • Use the current weight of SCW to construct the selection policy • Optimize a submodular function consists of two criterions • Uncertainty w.r.t. SCW • Redundancy w.r.t. the candidate pool 10
  • 11.
    Experiments validated algorithmson both server and client Goal Data sets Sessions • Explore a good combination of techniques for communication-efficient online semi-supervised learning • 10 data sets downloaded from UCI, LibSVM website • Benchmark the model on the server. • Fix the labeling rate 2%, sampling rate 20%, and selection policy to “rand” on the client. • Benchmark the selection strategy on the client. • Fix the labeling rate 2%, sampling rate 20%, and server’s model to the best obtained in session 1. • Explore how the labeling rate and sampling rate affect the overall performance. • Fix the server’s model to the best obtained in session 1; fix the client’s policy to the best obtained in session 2. 1 2 3 Evaluation • Offline accuracy on test set Experiments
  • 12.
    Two-learner model effectivelylearns from unlabeled data 12 Full HS+SCW+ CUT HS+SCW SCW None KNN+SCW KNN All uploaded instances are labeled by an oracle. This approach should give the best result due to the availability of full information. Proposed two-learner model with cutoff averaging for predicting test data. Proposed two-learner model on the server. The server consists of an SCW model only, which “learns” each unlabeled instance using its own prediction. No unlabeled instances are uploaded to the server. The server stops learning right after labeled instances. This approach should give the worst performance. The server consists of a two-learner model: knn followed by scw. The prediction of knn is used for training scw. The server employs 5-nearest neighbors algorithm. The training set is built by first including all labeled instances, and then adding unlabeled instances with its corresponding predicted labels. Proposed two-learner model Model on server Description Acc. avg. on 10 data sets 92.16% 86.71% 86.38% 83.73% 84.55% 84.31% 82.89% Result on a data set
  • 13.
    Better selection policyachieves higher accuracy with same communication budget 13 Full Submod Uncertain Rand All Certain All uploaded instances are labeled by an oracle. This approach should give the best result due to the availability of full information. Selection is done by optimizing a submodular function, which considers both uncertainty and redundancy. The most uncertain instances are uploaded. Randomly selects instances for uploading. All unlabeled instances are uploaded without selection. This incurs 5x the communication costs versus other approaches. The most certain instances according to the current SCW on the server are uploaded. Selection policy on client Description Acc. avg. on 10 data sets 92.16% 87.08% 87.12% 86.38% 86.32% 82.39% None The server employs 5-nearest neighbors algorithm. The training set is built by first including all labeled instances, and then adding unlabeled instances with its corresponding predicted labels. 82.89% Result on a data set
  • 14.
    Best combination oftechniques reduces communication cost while maintaining accuracy 14 }FrameworkClient Communication Server Selection policy on client Labeling rate (a mount of human effort) Sampling rate (a mount of communication cost) Accuracy averaged on 10 data sets Full 100% 20% 92.16% All 2% 100% 86.32% Rand 2% 20% 86.38% Best comb. (submod) 2% 20% 87.08% Unlabeled data Candidate pool Selection policy Upload selections Two-learner model Update selection policy Client Server Keysteps
  • 15.
    Project establishes aframework that enables communication-efficient learning in client-server settings 15 Client Server Introduce a novel learning setting motivated by many big data applications. Propose a framework that is modular in design, flexible, and can be practically incorporated into a variety of useful systems. Present a novel techniques at the clients and the server that are well-suited to providing high classification accuracy with reduced communication and labeling costs. Show that some particular combination of techniques outperforms other approaches, and often outperforms (communication expensive) approaches that send all the data to the server.