Your SlideShare is downloading. ×
Self Taught Learning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Self Taught Learning

1,674
views

Published on

ICML 2007 paper report

ICML 2007 paper report

Published in: Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,674
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Self-taught learning: Transfer learning from unlabelled data
    Authors: Rajat Raina et al.
    @Stanford University
    Presenter: Shao-Chuan Wang
    Some slides are from author’s work
  • 2. Self-taught learning: Transfer learning from unlabelled data
    Objective:
    Use the unlabelled data to improve the performance on a classification task.
    Kea Ideas:
    Relax the assumption about the unlabelled data.
    Use unlabelled data to learn the best representation (dictionary)
  • 3. Machine Learning Schemes
    • Supervised learning
    • 4. Semi-supervised learning.
    • 5. Transfer learning.
    • 6. Next: Self-taught learning?
    Cars
    Motorcycles
    Cars
    Motorcycles
    Cars
    Motorcycles
    Bus
    Tractor
    Aircraft
    Helicopter
    Motorcycle
    Car
    Natural scenes
  • 7. Self-taught Learning
    Labeled examples:
    Unlabeled examples:
    The unlabeled and labeled data:
    Need not share labels y.
    Need not share a generative distribution.
    Advantage: Such unlabeled data is often easy to obtain.
  • 8. Self-taught learning
    Use unlabelled data to learn the best representation.
  • 9. Learning the structure
    Sparse coding: learning the dictionary
    Encoding (L1-regularized least square problem)
    Least angle regression (Efron et al. 2004) (very fast!!)
    Feature-sign search (Honglak Lee et al. 2006)
    Coordinate descent (Friedman et al. 2008)
    Update dictionary(L2-constrained least square problem)
    K-SVD (Aharon et al. 2006)
    Online dictionary learning (Mairal et al. 2009)
    Reconstruction error
    Sparsity penalty
  • 10. Self-taught learning: flow
    K-SVD, online dictionary learning…
    Learning algorithm
    Unlabelled
    data
    Dictionary
    Encoded
    Training
    features
    Encoding
    SVM with Fisher kernel
    Training data
    Encoded
    Testing
    features
    Testing data
    The Lasso
    Predicted
    Labels
  • 11. SVM with Fisher kernel
    Fisher kernel
    In Bayesian view,
    Laplace prior
  • 12. Example atoms
    Natural images.
    “edges”
    “strokes”
    Handwritten characters.
    Sparse representation gives a higher level representation
  • 13. Result: Character recognition
    Digits
    Handwritten English
    English font
    Handwritten English classification
    (20 labeled images per handwritten character)
    Bases learnt on digits
    English font classification
    (20 labeled images per font character)
    Bases learnt on handwritten English
    8.2% error reduction
    2.8% error reduction
  • 14. Image classification
    Other reported results:
    Fei-Fei et al, 2004: 16%
    Berg et al., 2005: 17%
    Holub et al., 2005: 40%
    Serre et al., 2005: 35%
    Berg et al, 2005: 48%
    Zhang et al., 2006: 59%
    Lazebnik et al., 2006: 56%
    (15 labeled images per class)
    36.0% error reduction
    Self-taught Learning
  • 15. Discussion
    Why sparse coding? Can this idea (learning structure from unlabelled image) be applied to other encoding schemes?
    Probably because the nonlinear coding from x to alpha? Emulation of “end-stopping” phenomenon. (A feature is maximally activated by edges of only a specific orientation and length)
  • 16. Summary
    Cars
    Motorcycles
    • Self-taught learning: Unlabeled data does not share the labels of the classification task.
    • 17. Use unlabeled data to discover features.
    • 18. Use sparse coding to construct an easy-to-classify, “higher-level” representation.
    Unlabeled images
    = 0.8 * + 0.3 * + 0.5 *