Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Boosting based Transfer Learning
1. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
CUbiC
ARIZONA STATE UNIVERSITY
A Study of Boosting based Transfer Learning for
Activity and Gesture Recognition
Ashok Venkatesan
Committee Members
Sethuraman Panchanathan, Professor (Chair)
Jieping Ye, Associate Professor
Baoxin Li, Associate Professor
Master’s Thesis Defense
2. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation
• Transfer Learning
• Problem and Related Work
• Cost-Sensitive Boosting
• Results and Discussions
• Conclusion
3. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation : Real World Data, Dataset Shifts, Traditional Learning
• Transfer Learning
• Problem and Related Work
• Cost-Sensitive Boosting
• Results and Discussions
• Conclusion
4. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Real-World Data
Difficult to learn as it is Non-Stationary and Continuously Evolving
5. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Example : Spam Filtering
A spam filter is trained on random emails tracked from a group of users
under the assumption that new users would classify spam identically.
1. What if the training data is no longer relevant?
2. What if the user preferences are not identical?
6. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Motivational Example : Accelerometer Based 3D Gesture Recognition
A gesture recognition model is trained on mock data obtained in a control environment
under the assumption that real life data would be identical
1. What if the user has peculiar traits?
2. What if environmental factors and the objects interacted with vary and impact the
property of the gesture?
Scoop
Stir
7. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Simple Covariate Shift
• Change in 𝑃(𝑥) due to the
change in a known covariate.
Prior Probability Shift
• Change in 𝑃(𝑦) when
𝑃(𝑦|𝑥) is modeled as
𝑃(𝑥|𝑦) 𝑃(𝑦)
Sample Selection
Bias
• 𝑃 𝑥𝑖 ≠ 𝑃(𝑥)
Imbalanced Data
• Change in 𝑃(𝑦) by
design
Domain Shift
• Change in measurement
system of 𝑥𝑖
Source Component
Shift
• Involves changes in
strength of contributing
components
Concept Drift
• Change in 𝑃(𝑦|𝑥) in
continuous and real-time
data streams
Dataset Shift[1]
[1] Quionero-Candela, J., et al., Dataset shift in Machine Learning. s.l. : The MIT Press, 2009
8. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Training and test examples are assumed to be independently drawn and
identically distributed
Traditional Learning
NOT SUITED FOR HANDLING DATASET SHIFTS
Algorithm
Algorithm
Tasks Models
Traditional Learning over Multiple Domains
9. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation
• Transfer Learning : Definition, Learning Settings, Notation,
Problem, What to Transfer?
• Instance-Weighting using Boosting
• Cost-Sensitive Boosting
• Results and Discussions
• Conclusion
10. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Definition[2][3]
“Transfer Learning is a methodology that uses prior acquired
knowledge to effectively develop a new hypothesis. It emphasizes
knowledge transfer across domains, tasks and distributions that are
similar but not the same.”
[2] NIPS Inductive Transfer Workshop 2005
[3] Pan, S.J. and Yang, Q., "A Survey on Transfer Learning, TKDE 2009
• It is motivated by human learning. People can often transfer knowledge
learnt previously to novel situations.
• e.g. Knowing how to ride a bicycle might help improve learning to ride a
motorbike
• Outdated data representing prior knowledge is called to as Source
• Newer data representing the newer knowledge is referred to as Target.
𝐷 = *𝒳, 𝑃(𝑋)+ 𝑇 = *𝒴, 𝑓(. )+ 𝑃(𝑋), 𝑃(𝑌) & 𝑃(𝑌|𝑋)
11. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Algorithm
Insufficient Target
Training Data
Target Task Model
Transfer Learning - Illustration
Algorithm
Knowledge
Abundant Source
Training Data
Transfer Learning is beneficial for lessening labeling costs associated in re-training a model from
scratch and to make classification rapidly adaptable in real-time.
12. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Few labeled target domain data is available
for obtaining a weak inductive bias. Source
data is used as auxiliary data.
Inductive
Transfer
• Lots of labeled source data and lots of
unlabeled target domain data. Capitalize
on the difference in the domains.
Transductive
Transfer
• Both source data and target data are
unlabeled. Apply techniques such as
clustering and density estimation
Unsupervised
Transfer
Transfer Settings[3]
The scope of Transfer Learning in general is to learn a classifier that performs well over
target data samples alone. Classification performance over source tasks is ignored.
[3] Pan, S.J. and Yang, Q., "A Survey on Transfer Learning, TKDE 2009
13. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Two sets of tasks, source and target, represented by
instances 𝑋𝑠𝑜𝑢𝑟𝑐𝑒, 𝑋𝑡𝑎𝑟𝑔𝑒𝑡 ∈ 𝒳 and labels 𝑌𝑠𝑜𝑢𝑟𝑐𝑒, 𝑌𝑡𝑎𝑟𝑔𝑒𝑡 ∈ 𝒴 such that,
𝑃 𝑋𝑠𝑜𝑢𝑟𝑐𝑒, 𝑌𝑠𝑜𝑢𝑟𝑐𝑒 ≠ 𝑃(𝑋𝑡𝑎𝑟𝑔𝑒𝑡, 𝑌𝑡𝑎𝑟𝑔𝑒𝑡)
• Training examples are grouped and named based on
their task distributions
– Same task distribution as target ,
– Different task distribution from that of target,
• Unlabeled test examples representing the target tasks,
Notation
𝑇𝑠 = 𝑥𝑖
𝑠
, 𝑦𝑖
𝑠
𝑖=1
𝑚
𝑇𝑑 = 𝑥𝑗
𝑑
, 𝑦𝑗
𝑑
𝑗=1
𝑛
𝑆
14. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Problem Statement
Unseen
Target Data
(𝑆)
Abundant
Source Data
𝑇𝑑 = 𝑥𝑖
𝑑
, 𝑦𝑖
𝑑
𝑖=1
𝑛
Little Labeled
Target Data
𝑇𝑠 = 𝑥𝑗
𝑠
, 𝑦𝑗
𝑠
𝑗=1
𝑚
Model trained on 𝑇𝑑
Target model
Model trained on 𝑇𝑠
Objective: Given |𝑇𝑠| ≪ |𝑇𝑑| and that 𝑇𝑠 is insufficient to learn the
target tasks, learn a model using 𝑇𝑑 ∪ 𝑇𝑠 that classifies target task
examples 𝑆 with minimum error.
15. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Instance-based
• Reuse instances observed in source domain similar to the target domain.
• E.g. – Instance reweighting, Importance sampling
Feature-based
• Find an alternate feature space for learning the target domain while
projecting the source domain in the new space.
• E.g. – Feature subset selection, Feature space transformation
Model/Parameter-based
• Use model components such as parameters and hyper-parameters to
influence learning the target task.
• E.g. – Parameter-space partitioning, Superimposing shape constraints
What to Transfer?
16. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation
• Transfer Learning
• Instance-Weighting using Boosting : Instance
Weighting, AdaBoost, TrAdaBoost, TransferBoost, Limitations
• Cost-Sensitive Boosting
• Results and Discussions
• Conclusion
17. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• AdaBoost[4] boosts a weak learning algorithm into a strong
learner by linearly combining an ensemble of weak
hypotheses.
• Why Boosting based Instance-Weighting?
– Provides theoretical guarantees on generalization error bounds.
– Incremental instance boosting aids in systematic selection of
important examples
– Well defined focus areas to modified for knowledge transfer
• Weak hypothesis loss function
• Weight update scheme
• Linear combination of the weak hypotheses
Boosting
[4] Freund, Y., Schapire, R. and Abe, N., "A Short Introduction to Boosting“, JSAI,
1999
18. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Stage 1 Stage 2 (AdaBoost)
Similarity
Measure
Weak
Hypotheses
Loss
Function
Weight
Update
Scheme
Linear
combination
of Weak
Hypotheses
Instance-Weighting and Boosting
• Two recent instance-weighting algorithms adapt AdaBoost
for knowledge transfer :
• TrAdaBoost[5],
• TransferBoost[6]
[5] Dai, W., et al., "Boosting for transfer learning." ICML, 2007
[6] Eaton, E. and Desjardins., "Set-Based Boosting for Instance-level Transfer”,IEEE, ICDM-W 2009.
19. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
AdaBoost
• 𝜖 𝑡 = 𝑝𝑖
𝑡
| 𝑡 𝑥𝑖 − 𝑦𝑖|𝑁
𝑖=1
• 𝛼 𝑡 =
1
2
log
1−𝜖 𝑡
𝜖 𝑡
Loss Function
• 𝑤𝑖
𝑡+1
= 𝑤𝑖
𝑡
exp −𝛼 𝑡 𝑦𝑖 𝑥𝑖Weight Update
• 𝐻 𝑥 = 𝑠𝑖𝑔𝑛 𝛼 𝑡 𝑡 𝑥𝑇
𝑡=1
Linear
Combination
Main Idea: Increase weights of misclassified training
samples (𝑇𝑑 ∪ 𝑇𝑠)
22. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• TrAdaBoost
– decreases weights of supporting source domain
instances, making knowledge transfer inefficient.
– converging over target error makes it prone to over
fitting
• TransferBoost
– positive transferability is hard to come by due to the
small size of 𝑇𝑠
– requires external information of the structure of data to
be of any use
Limitations
23. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation
• Transfer Learning
• Instance-Weighting using Boosting
• Cost-Sensitive Boosting : General Idea, Weight update
schemes, Algorithm , Cost Estimation, Dynamic Cost
• Results and Discussions
• Conclusion
24. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Stage 1 Stage 2 (AdaBoost)
Similarity
Measure
Weak
Hypotheses
Loss
Function
Weight
Update
Scheme
Linear
combination
of Weak
Hypotheses
General Idea
• Compute instance-weights for Td and Ts separately.
• Augment 𝑇𝑑 instances with computed cost factors 𝐶.
• Learn a strong classifier to minimize training error over 𝑇𝑠 and reduce net
misclassification cost over 𝑇𝑑.
25. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Weight Update Schemes[7]
[7] Sun, Y. et al., "Cost-sensitive boosting for classification of imbalanced data." 2007
27. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Cost Properties
• Represents the similarity of instance distributions and
classification functions between 𝑇𝑠 and 𝑇𝑑.
• Lies in the interval ,0,1-.
• Relevant examples have cost values lying closer to 1.
• 𝑇𝑑 examples that have a cost, 𝑐𝑖 = 0 are not used for
training.
28. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Instance Pruning[8]
The probability of correct classification of an instance by
model trained on 𝑇𝑠
• Relevance Measure
𝑑𝑖𝑠𝑡 𝑥𝑖
𝑑
, 𝑥𝑗
𝑠
𝑗,𝑦𝑖
𝑑≠𝑦𝑗
𝑠
𝑑𝑖𝑠𝑡 𝑥𝑖
𝑑
, 𝑥𝑗
𝑠
𝑗,𝑦𝑖
𝑑=𝑦𝑗
𝑠
Cost Estimation
[8] Jiang, J. and Zhai, C.X., "Instance weighting for domain adaptation in NLP.“, Association For
Computational Linguistics, 2007
29. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• KL Importance Estimation Procedure[9]
Transductively estimates
𝑃 𝑥 𝑠 𝑖
𝑃 𝑥 𝑡 𝑖
by minimizing KL
divergence between distributions of 𝑇𝑑 and 𝑇𝑠
• Concept Feature Vector Distance[10]
Measures the distance between the Concept Feature
Vectors that represent different class labels in 𝑇𝑑 and 𝑇𝑠.
Cost Estimation
[9] Sugiyama, M. et al., "Direct importance estimation with model selection and its
application to covariate shift adaptation." NIPS, 2008
[10] Katakis, I. et al., “An Ensemble of Classifiers for coping with Recurring Contexts in
Data Streams.”, ECAI, 2008.
30. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Dynamic Cost-Sensitive Boosting
11. Update the cost vector C by calling the Cost Estimation Procedure along with the
weights of 𝑇𝑠
31. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation
• Transfer Learning
• Instance-Weighting using Boosting
• Cost-Sensitive Boosting
• Results and Discussions : Datasets, Classification
Accuracies, Dominance of AdaC2, vs. % of Training Data, Effect of Cost,
Dynamic Cost, Multisource Transfer
• Conclusion
32. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Source and Target Datasets
– Multi-class mock laboratory data(20
action samples from 5 users)
– Multi-class real-life data (4 users made 4
glasses of Gatorade and drank)
– 44 features and 500 source instances.
• Factors that induce dataset shift:
– Environmental Factors including size, shape
and weight of real-world objects
– User traits
• Avg. cross validation accuracy was
obtained over 5 trials.
Datasets
Act_gest : Accelerometer Based 3D Gesture Recognition (4 datasets)
33. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Activity Gesture dataset shows clear signs of a domain shift upon performing a PCA on the feature
points and projecting its instances onto the first three principle components.
34. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Activity Gesture dataset shows clear signs of a domain shift upon performing a PCA on the feature
points and projecting its instances onto the first three principle components.
35. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Multi-Source Datasets
– Multi-class activity data captured from
different 7 smart home test beds.
– Modeled into single source and target
datasets by using one vs. all.
– 19 features and a max of 5468 instances
for a source
• Factors that introduce dataset shift:
– Different Apartment Layouts
– Different Residents
• Avg. cross validation accuracy was
obtained over 5 trials.
Datasets
WSU Smart Home Activity Recognition (7 datasets)
36. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
WSU Activity Recognition datasets show signs of a shift in 𝑃 𝑋 . Of particular interest is the how the
dataset shift varies in agreement to the actual task in question.
37. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Source and Target Datasets
– 65K features were reduced to 45k features using document frequency thresholding.
– All features were encoded to binary
– Modeled into a binary classification dataset with class labels as one subcategory vs.
another.
• Factors that introduce concept drift in the gesture dataset
– Different term frequencies
– Synthetically generated from different subcategories.
• Avg. cross validation accuracy was obtained over 5 trials.
Datasets
20Newsgroups 1 (6 datasets)
• A multi-source variation containing one subcategory vs. noisy subcategories.
20Newsgroups 2 (7 datasets)
45. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
Outline
• Motivation
• Transfer Learning
• Instance-Weighting using Boosting
• Cost-Sensitive Boosting
• Results and Discussions
• Conclusion : Conclusion, Thesis Summary, Future Directions,
Dissemination
46. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• An extension of AdaBoost for Transfer Learning
• Performs better than existing instance transfer techniques on real-word
datasets.
• Provides flexibility in using different relatedness measures and base classifiers
• Has good theoretical basis
Conclusion
• May be prone to over fitting
• Performance is dependent on the effectiveness of the cost estimated.
• Relies on being a bottom-top weighting approach. Does not utilize a given
structure of data.
Pros
Cons
47. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Cost-sensitive boosting schemes were evaluated over real-
world datasets and compared against well known
algorithms.
• 3 variants of cost-sensitive boosting algorithms were
investigated. AdaC2 was found to be better among the lot.
• 4 different relatedness measures were evaluated. Instance
pruning was found to give better results.
• Effect of maintaining a dynamic cost scheme was studied.
• Equivalence of AdaC2 with respect to multisource transfer
learning was analyzed.
Summary
48. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• Estimating Relatedness
– Does a better a priori relatedness measure exist?
• Target Domain Instance Selection
– How to optimally select instances from the target
domain?
• Discovering Structure in datasets
– How can an existing structure in data be capitalized?
• System Integration
– How to best integrate these methodologies into an
application framework?
Future Directions
49. CENTER FOR COGNITIVE UBIQUITOUS COMPUTING
• A.Venkatesan, N.C.Krishnan, and S. Panchanathan, "Cost-sensitive
Boosting for Concept Drift", ECML Workshop on Handling Concept
Drift in Adaptive Information Systems (HaCDAIS), Barcelona, Spain,
2010.
• N.C. Krishnan, A. Venkatesan, S. Panchanathan, D.Cook, “Cost-
sensitive Boosting for Transfer Learning”, In preparation to be
submitted to IEEE Transactions on Knowledge and Data Engineering.
Dissemination