This document discusses structured support vector machines (SSVMs). SSVMs are a method for learning parameters for structured prediction problems by directly minimizing expected loss. SSVMs replace the intractable expected loss with an empirical estimate using a training set. The loss function is replaced with a convex upper bound to allow for numerical optimization using subgradient descent. SSVMs can be applied to problems like multiclass classification and hierarchical classification.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Al...Martin Pelikan
The linkage tree genetic algorithm (LTGA) identifies linkages between problem variables using an agglomerative hierarchical clustering algorithm and linkage trees. This enables LTGA to solve many decomposable problems that are difficult with more conventional genetic algorithms. The goal of this paper is two-fold: (1) Present a thorough empirical evaluation of LTGA on a large set of problem instances of additively decomposable problems and (2) speed up the clustering algorithm used to build the linkage trees in LTGA by using a pairwise and a problem-specific metric.
http://medal.cs.umsl.edu/files/2011001.pdf
Nonlinear transport phenomena: models, method of solving and unusual features...SSA KPI
AACIMP 2010 Summer School lecture by Vsevolod Vladimirov. "Applied Mathematics" stream. "Selected Models of Transport Processes. Methods of Solving and Properties of Solutions" course. Part 10.
More info at http://summerschool.ssa.org.ua
Design of observers for nonlinear systems using the Frobenius theorem. Presentation for the defense of my MSc Thesis at the School of Applied Mathematics, NTU Athens.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Binary classification and linear separators. Perceptron, ADALINE, artifical neurons. Artificial neural networks (ANNs), activation functions, and universal approximation theorem. Linear versus non-linear classification problems. Typical tasks, architectures and loss functions. Gradient descent and back-propagation. Support Vector Machines (SVMs), soft-margins and kernel trick. Connexions between ANNs and SVMs.
Overview of the course. Introduction to image sciences, image processing and computer vision. Basics of machine learning, terminologies, paradigms. No-free lunch theorem. Supervised versus unsupervised learning. Clustering and K-Means. Classification and regression. Linear least squares and polynomial curve fitting. Model complexity and overfitting. Curse of dimensionality. Dimensionality reduction and principal component analysis. Image representation, semantic gap, image features, and classical computer vision pipelines.
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov – National Centre for Geocomputation, National University of Ireland , Maynooth (Ireland)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)
Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Al...Martin Pelikan
The linkage tree genetic algorithm (LTGA) identifies linkages between problem variables using an agglomerative hierarchical clustering algorithm and linkage trees. This enables LTGA to solve many decomposable problems that are difficult with more conventional genetic algorithms. The goal of this paper is two-fold: (1) Present a thorough empirical evaluation of LTGA on a large set of problem instances of additively decomposable problems and (2) speed up the clustering algorithm used to build the linkage trees in LTGA by using a pairwise and a problem-specific metric.
http://medal.cs.umsl.edu/files/2011001.pdf
Nonlinear transport phenomena: models, method of solving and unusual features...SSA KPI
AACIMP 2010 Summer School lecture by Vsevolod Vladimirov. "Applied Mathematics" stream. "Selected Models of Transport Processes. Methods of Solving and Properties of Solutions" course. Part 10.
More info at http://summerschool.ssa.org.ua
Design of observers for nonlinear systems using the Frobenius theorem. Presentation for the defense of my MSc Thesis at the School of Applied Mathematics, NTU Athens.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Binary classification and linear separators. Perceptron, ADALINE, artifical neurons. Artificial neural networks (ANNs), activation functions, and universal approximation theorem. Linear versus non-linear classification problems. Typical tasks, architectures and loss functions. Gradient descent and back-propagation. Support Vector Machines (SVMs), soft-margins and kernel trick. Connexions between ANNs and SVMs.
Overview of the course. Introduction to image sciences, image processing and computer vision. Basics of machine learning, terminologies, paradigms. No-free lunch theorem. Supervised versus unsupervised learning. Clustering and K-Means. Classification and regression. Linear least squares and polynomial curve fitting. Model complexity and overfitting. Curse of dimensionality. Dimensionality reduction and principal component analysis. Image representation, semantic gap, image features, and classical computer vision pipelines.
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov – National Centre for Geocomputation, National University of Ireland , Maynooth (Ireland)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)
Basics of probability in statistical simulation and stochastic programmingSSA KPI
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 2.
More info at http://summerschool.ssa.org.ua
Linear Machine Learning Models with L2 Regularization and Kernel TricksFengtao Wu
The slides are the course project presentation for INFSCI 2915 Machine Learning Foundations course. The presentation reviewed and summarized how the L2 regularization techniques are applied in the linear machine models including linear regression, logistic regression, support vector machine and perceptron learning algorithm. Also the presentation reviewed the quadratic programming problem and took SVM model as an example to illustrate the relation between primal and dual problem. At last, the presentation reviewed the general conclusion which is the representer theorem, and connected the kernel tricks to the L2 regularized linear models.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 6.
More info at http://summerschool.ssa.org.ua
The first report of Machine Learning Seminar organized by Computational Linguistics Laboratory at Kazan Federal University. See http://cll.niimm.ksu.ru/cms/lang/en_US/main/seminars/mlseminar
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 5.
More info at http://summerschool.ssa.org.ua
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Large Language Model (LLM) and it’s Geospatial Applications
04 structured support vector machine
1. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Part 5: Structured Support Vector Machines
Sebastian Nowozin and Christoph H. Lampert
Colorado Springs, 25th June 2011
1 / 56
2. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Problem (Loss-Minimizing Parameter Learning)
Let d(x, y) be the (unknown) true data distribution.
Let D = {(x1 , y 1 ), . . . , (xN , y N )} be i.i.d. samples from d(x, y).
Let φ : X × Y → RD be a feature function.
Let ∆ : Y × Y → R be a loss function.
Find a weight vector w∗ that leads to minimal expected loss
E(x,y)∼d(x,y) {∆(y, f (x))}
for f (x) = argmaxy∈Y w, φ(x, y) .
2 / 56
3. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Problem (Loss-Minimizing Parameter Learning)
Let d(x, y) be the (unknown) true data distribution.
Let D = {(x1 , y 1 ), . . . , (xN , y N )} be i.i.d. samples from d(x, y).
Let φ : X × Y → RD be a feature function.
Let ∆ : Y × Y → R be a loss function.
Find a weight vector w∗ that leads to minimal expected loss
E(x,y)∼d(x,y) {∆(y, f (x))}
for f (x) = argmaxy∈Y w, φ(x, y) .
Pro:
We directly optimize for the quantity of interest: expected loss.
No expensive-to-compute partition function Z will show up.
Con:
We need to know the loss function already at training time.
We can’t use probabilistic reasoning to find w∗ .
3 / 56
4. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Reminder: learning by regularized risk minimization
For compatibility function g(x, y; w) := w, φ(x, y) find w∗ that minimizes
E(x,y)∼d(x,y) ∆( y, argmaxy g(x, y; w) ).
Two major problems:
d(x, y) is unknown
argmaxy g(x, y; w) maps into a discrete space
→ ∆( y, argmaxy g(x, y; w)) is discontinuous, piecewise constant
4 / 56
5. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Task:
min E(x,y)∼d(x,y) ∆( y, argmaxy g(x, y; w) ).
w
Problem 1:
d(x, y) is unknown
Solution:
1
Replace E(x,y)∼d(x,y) · with empirical estimate N (xn ,y n ) ·
To avoid overfitting: add a regularizer, e.g. λ w 2.
New task:
N
2 1
min λ w + ∆( y n , argmaxy g(xn , y; w) ).
w N
n=1
5 / 56
6. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Task:
N
2 1
min λ w + ∆( y n , argmaxy g(xn , y; w) ).
w N
n=1
Problem:
∆( y, argmaxy g(x, y; w) ) discontinuous w.r.t. w.
Solution:
Replace ∆(y, y ) with well behaved (x, y, w)
Typically: upper bound to ∆, continuous and convex w.r.t. w.
New task:
N
2 1
min λ w + (xn , y n , w))
w N
n=1
6 / 56
7. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Regularized Risk Minimization
N
2 1
min λ w + (xn , y n , w))
w N
n=1
Regularization + Loss on training data
7 / 56
8. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Regularized Risk Minimization
N
2 1
min λ w + (xn , y n , w))
w N
n=1
Regularization + Loss on training data
Hinge loss: maximum margin training
(xn , y n , w) := max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
y∈Y
8 / 56
9. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Regularized Risk Minimization
N
2 1
min λ w + (xn , y n , w))
w N
n=1
Regularization + Loss on training data
Hinge loss: maximum margin training
(xn , y n , w) := max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
y∈Y
is maximum over linear functions → continuous, convex.
bounds ∆ from above.
Proof: Let y = argmaxy g(xn , y, w)
¯
∆(y n , y ) ≤ ∆(y n , y ) + g(xn , y , w) − g(xn , y n , w)
¯ ¯ ¯
≤ max ∆(y n , y) + g(xn , y, w) − g(xn , y n , w)
y∈Y
9 / 56
10. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Regularized Risk Minimization
N
2 1
min λ w + (xn , y n , w))
w N
n=1
Regularization + Loss on training data
Hinge loss: maximum margin training
(xn , y n , w) := max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
y∈Y
Alternative:
Logistic loss: probabilistic training
(xn , y n , w) := log exp w, φ(xn , y) − w, φ(xn , y n )
y∈Y
10 / 56
11. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Structured Output Support Vector Machine
N
1 2 C
min w + max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
w 2 N y∈Y
n=1
Conditional Random Field
N
w 2
min + log exp w, φ(xn , y) − w, φ(xn , y n )
w 2σ 2
n=1 y∈Y
CRFs and SSVMs have more in common than usually assumed.
both do regularized risk minimization
log y exp(·) can be interpreted as a soft-max
11 / 56
12. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solving the Training Optimization Problem Numerically
Structured Output Support Vector Machine:
N
1 2 C
min w + max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
w 2 N y∈Y
n=1
Unconstrained optimization, convex, non-differentiable objective.
12 / 56
13. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Structured Output SVM (equivalent formulation):
N
1 2 C
min w + ξn
w,ξ 2 N
n=1
subject to, for n = 1, . . . , N ,
max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n ) ≤ ξn
y∈Y
N non-linear contraints, convex, differentiable objective.
13 / 56
14. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Structured Output SVM (also equivalent formulation):
N
1 2 C
min w + ξn
w,ξ 2 N
n=1
subject to, for n = 1, . . . , N ,
∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n ) ≤ ξ n , for all y ∈ Y
N |Y| linear constraints, convex, differentiable objective.
14 / 56
15. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Example: Multiclass SVM
1 for y = y
Y = {1, 2, . . . , K}, ∆(y, y ) = .
0 otherwise
φ(x, y) = y = 1 φ(x), y = 2 φ(x), . . . , y = K φ(x)
N
1 2 C
Solve: min w + ξn
w,ξ 2 N
n=1
subject to, for i = 1, . . . , n,
w, φ(xn , y n ) − w, φ(xn , y) ≥ 1 − ξ n for all y ∈ Y {y n }.
Classification: f (x) = argmaxy∈Y w, φ(x, y) .
Crammer-Singer Multiclass SVM
[K. Crammer, Y. Singer: ”On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines”, JMLR, 2001] 15 / 56
16. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Example: Hierarchical SVM
Hierarchical Multiclass Loss:
1
∆(y, y ) := (distance in tree)
2
∆(cat, cat) = 0, ∆(cat, dog) = 1,
∆(cat, bus) = 2, etc.
N
1 2 C
Solve: min w + ξn
w,ξ 2 N
n=1
subject to, for i = 1, . . . , n,
w, φ(xn , y n ) − w, φ(xn , y) ≥ ∆(y n , y) − ξ n for all y ∈ Y.
[L. Cai, T. Hofmann: ”Hierarchical Document Categorization with Support Vector Machines”, ACM CIKM, 2004]
[A. Binder, K.-R. M¨ller, M. Kawanabe: ”On taxonomies for multi-class image categorization”, IJCV, 2011]
u
16 / 56
17. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solving the Training Optimization Problem Numerically
We can solve SSVM training like CRF training:
N
1 2 C
min w + max ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
w 2 N y∈Y
n=1
continuous
unconstrained
convex
non-differentiable
→ we can’t use gradient descent directly.
→ we’ll have to use subgradients
17 / 56
18. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Definition
Let f : RD → R be a convex, not necessarily differentiable, function.
A vector v ∈ RD is called a subgradient of f at w0 , if
f (w) ≥ f (w0 ) + v, w − w0 for all w.
f(w) f(w0)+⟨v,w-w0⟩
f(w )
0
w
w 0
18 / 56
19. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Definition
Let f : RD → R be a convex, not necessarily differentiable, function.
A vector v ∈ RD is called a subgradient of f at w0 , if
f (w) ≥ f (w0 ) + v, w − w0 for all w.
f(w) f(w0)+⟨v,w-w0⟩
f(w )0
w
w 0
19 / 56
20. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Definition
Let f : RD → R be a convex, not necessarily differentiable, function.
A vector v ∈ RD is called a subgradient of f at w0 , if
f (w) ≥ f (w0 ) + v, w − w0 for all w.
f(w) f(w0)+⟨v,w-w0⟩
f(w )
0
w
w 0
20 / 56
21. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Definition
Let f : RD → R be a convex, not necessarily differentiable, function.
A vector v ∈ RD is called a subgradient of f at w0 , if
f (w) ≥ f (w0 ) + v, w − w0 for all w.
f(w) f(w0)+⟨v,w-w0⟩
f(w ) 0
w
w 0
For differentiable f , the gradient v = f (w0 ) is the only subgradient.
f(w) f(w0)+⟨v,w-w0⟩
f(w )0
w
w 0
21 / 56
22. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Subgradient descent works basically like gradient descent:
Subgradient Descent Minimization – minimize F (w)
require: tolerance > 0, stepsizes ηt
wcur ← 0
repeat
v ∈ subw F (wcur )
wcur ← wcur − ηt v
until F changed less than
return wcur
Converges to global minimum, but rather inefficient if F non-differentiable.
[Shor, ”Minimization methods for non-differentiable functions”, Springer, 1985.]
22 / 56
23. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
23 / 56
24. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w)
y
w
For each y ∈ Y, y (w) is a linear function.
24 / 56
25. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w) y'
w
For each y ∈ Y, y (w) is a linear function.
25 / 56
26. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w)
w
For each y ∈ Y, y (w) is a linear function.
26 / 56
27. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w)
w
(w) = maxy y (w): maximum over all y ∈ Y.
27 / 56
28. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w)
w
w 0
Subgradient of n at w0 :
28 / 56
29. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w)
w
w 0
Subgradient of n at w0 : find maximal (active) y.
29 / 56
30. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Computing a subgradient:
N
1 2 C n
min w + (w)
w 2 N
n=1
with n (w) = maxy n (w), and
y
n
y (w) := ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ℓ(w)
w
w 0
Subgradient of n at w0 : find maximal (active) y, use v = n (w ).
y 0
30 / 56
31. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Subgradient Descent S-SVM Training
input training pairs {(x1 , y 1 ), . . . , (xn , y n )} ⊂ X × Y,
input feature map φ(x, y), loss function ∆(y, y ), regularizer C,
input number of iterations T , stepsizes ηt for t = 1, . . . , T
1: w←0
2: for t=1,. . . ,T do
3: for i=1,. . . ,n do
4: y ← argmaxy∈Y ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ˆ
5: v n ← φ(xn , y ) − φ(xn , y n )
ˆ
6: end for
C
7: w ← w − ηt (w − N n v n )
8: end for
output prediction function f (x) = argmaxy∈Y w, φ(x, y) .
Observation: each update of w needs 1 argmax-prediction per example.
31 / 56
32. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
We can use the same tricks as for CRFs, e.g. stochastic updates:
Stochastic Subgradient Descent S-SVM Training
input training pairs {(x1 , y 1 ), . . . , (xn , y n )} ⊂ X × Y,
input feature map φ(x, y), loss function ∆(y, y ), regularizer C,
input number of iterations T , stepsizes ηt for t = 1, . . . , T
1: w←0
2: for t=1,. . . ,T do
3: (xn , y n ) ← randomly chosen training example pair
4: y ← argmaxy∈Y ∆(y n , y) + w, φ(xn , y) − w, φ(xn , y n )
ˆ
C
5: w ← w − ηt (w − N [φ(xn , y ) − φ(xn , y n )])
ˆ
6: end for
output prediction function f (x) = argmaxy∈Y w, φ(x, y) .
Observation: each update of w needs only 1 argmax-prediction
(but we’ll need many iterations until convergence)
32 / 56
33. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solving the Training Optimization Problem Numerically
We can solve an S-SVM like a linear SVM:
One of the equivalent formulations was:
N
2 C
min w + ξn
w∈RD ,ξ∈Rn
+
N
n=1
subject to, for i = 1, . . . n,
w, φ(xn , y n ) − w, φ(xn , y) ≥ ∆(y n , y) − ξ n , for all y ∈ Y‘.
Introduce feature vectors δφ(xn , y n , y) := φ(xn , y n ) − φ(xn , y).
33 / 56
34. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solve
N
2 C
min w + ξn
w∈RD ,ξ∈Rn
+
N
n=1
subject to, for i = 1, . . . n, for all y ∈ Y,
w, δφ(xn , y n , y) ≥ ∆(y n , y) − ξ n .
This has the same structure as an ordinary SVM!
quadratic objective
linear constraints
34 / 56
35. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solve
N
2 C
min w + ξn
w∈RD ,ξ∈Rn
+
N
n=1
subject to, for i = 1, . . . n, for all y ∈ Y,
w, δφ(xn , y n , y) ≥ ∆(y n , y) − ξ n .
This has the same structure as an ordinary SVM!
quadratic objective
linear constraints
Question: Can’t we use a ordinary SVM/QP solver?
35 / 56
36. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solve
N
2 C
min w + ξn
w∈RD ,ξ∈Rn
+
N
n=1
subject to, for i = 1, . . . n, for all y ∈ Y,
w, δφ(xn , y n , y) ≥ ∆(y n , y) − ξ n .
This has the same structure as an ordinary SVM!
quadratic objective
linear constraints
Question: Can’t we use a ordinary SVM/QP solver?
Answer: Almost! We could, if there weren’t N |Y| constraints.
E.g. 100 binary 16 × 16 images: 1079 constraints
36 / 56
37. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solution: working set training
It’s enough if we enforce the active constraints.
The others will be fulfilled automatically.
We don’t know which ones are active for the optimal solution.
But it’s likely to be only a small number ← can of course be formalized.
Keep a set of potentially active constraints and update it iteratively:
37 / 56
38. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solution: working set training
It’s enough if we enforce the active constraints.
The others will be fulfilled automatically.
We don’t know which ones are active for the optimal solution.
But it’s likely to be only a small number ← can of course be formalized.
Keep a set of potentially active constraints and update it iteratively:
Working Set Training
Start with working set S = ∅ (no contraints)
Repeat until convergence:
Solve S-SVM training problem with constraints from S
Check, if solution violates any of the full constraint set
if no: we found the optimal solution, terminate.
if yes: add most violated constraints to S, iterate.
38 / 56
39. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Solution: working set training
It’s enough if we enforce the active constraints.
The others will be fulfilled automatically.
We don’t know which ones are active for the optimal solution.
But it’s likely to be only a small number ← can of course be formalized.
Keep a set of potentially active constraints and update it iteratively:
Working Set Training
Start with working set S = ∅ (no contraints)
Repeat until convergence:
Solve S-SVM training problem with constraints from S
Check, if solution violates any of the full constraint set
if no: we found the optimal solution, terminate.
if yes: add most violated constraints to S, iterate.
Good practical performance and theoretic guarantees:
polynomial time convergence -close to the global optimum
[Tsochantaridis et al. ”Large Margin Methods for Structured and Interdependent Output Variables”, JMLR, 2005.]
39 / 56
40. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Working Set S-SVM Training
input training pairs {(x1 , y 1 ), . . . , (xn , y n )} ⊂ X × Y,
input feature map φ(x, y), loss function ∆(y, y ), regularizer C
1: S←∅
2: repeat
3: (w, ξ) ← solution to QP only with constraints from S
4: for i=1,. . . ,n do
5: y ← argmaxy∈Y ∆(y n , y) + w, φ(xn , y)
ˆ
6: if y = y n then
ˆ
7: S ← S ∪ {(xn , y )}
ˆ
8: end if
9: end for
10: until S doesn’t change anymore.
output prediction function f (x) = argmaxy∈Y w, φ(x, y) .
Observation: each update of w needs 1 argmax-prediction per example.
(but we solve globally for next w, not by local steps) 40 / 56
41. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
One-Slack Formulation of S-SVM:
1 n)
(equivalent to ordinary S-SVM formulation by ξ = N nξ
1 2
min w + Cξ
w∈RD ,ξ∈R+ 2
subject to, for all (ˆ1 , . . . , y N ) ∈ Y × · · · × Y,
y ˆ
N
∆(y n , y N ) + w, φ(xn , y n ) − w, φ(xn , y n )
ˆ ˆ ≤ N ξ,
n=1
41 / 56
42. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
One-Slack Formulation of S-SVM:
1 n)
(equivalent to ordinary S-SVM formulation by ξ = N nξ
1 2
min w + Cξ
w∈RD ,ξ∈R+ 2
subject to, for all (ˆ1 , . . . , y N ) ∈ Y × · · · × Y,
y ˆ
N
∆(y n , y N ) + w, φ(xn , y n ) − w, φ(xn , y n )
ˆ ˆ ≤ N ξ,
n=1
|Y|N linear constraints, convex, differentiable objective.
We blew up the constraint set even further:
100 binary 16 × 16 images: 10177 constraints (instead of 1079 ).
42 / 56
43. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Working Set One-Slack S-SVM Training
input training pairs {(x1 , y 1 ), . . . , (xn , y n )} ⊂ X × Y,
input feature map φ(x, y), loss function ∆(y, y ), regularizer C
1: S←∅
2: repeat
3: (w, ξ) ← solution to QP only with constraints from S
4: for i=1,. . . ,n do
5: y n ← argmaxy∈Y ∆(y n , y) + w, φ(xn , y)
ˆ
6: end for
7: S ← S ∪ { (x1 , . . . , xn ), (ˆ1 , . . . , y n ) }
y ˆ
8: until S doesn’t change anymore.
output prediction function f (x) = argmaxy∈Y w, φ(x, y) .
Often faster convergence:
We add one strong constraint per iteration instead of n weak ones.
43 / 56
44. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
We can solve an S-SVM like a non-linear SVM: compute Lagrangian dual
min becomes max,
original (primal) variables w, ξ disappear,
new (dual) variables αiy : one per constraint of the original problem.
Dual S-SVM problem
1
max αny ∆(y n , y) − αny αny δφ(xn , y n , y), δφ(xn , y n , y )
¯¯
¯ ¯
¯
n|Y|
α∈R+ 2
n=1,...,n y,¯∈Y
y
y∈Y n,¯ =1,...,N
n
subject to, for n = 1, . . . , N ,
C
αny ≤ .
N
y∈Y
N linear contraints, convex, differentiable objective, N |Y| variables.
44 / 56
45. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
We can kernelize:
Define joint kernel function k : (X × Y) × (X × Y) → R
k( (x, y) , (¯, y ) ) = φ(x, y), φ(¯, y ) .
x ¯ x ¯
k measure similarity between two (input,output)-pairs.
We can express the optimization in terms of k:
δφ(xn , y n , y) , δφ(xn , y n , y )
¯ ¯
¯
= φ(xn , y n ) − φ(xn , y) , φ(xn , y n ) − φ(xn , y )
¯ ¯ ¯
¯
= φ(xn , y n ), φ(xn , y n ) − φ(xn , y n ), φ(xn , y )
¯ ¯ ¯
¯
− φ(xn , y), φ(xn , y n ) + φ(xn , y), φ(xn , y )
¯ ¯ ¯
¯
= k( (xn , y n ), (xn , y n ) ) − k( (xn , y n ), φ(xn , y ) )
¯ ¯ ¯
¯
− k( (xn , y), (xn , y n ) ) + k( (xn , y), φ(xn , y ) )
¯ ¯ ¯
¯
=: Ki¯yy
ı ¯
45 / 56
46. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Kernelized S-SVM problem:
1
max αiy ∆(y n , y) − αiy α¯y Ki¯yy
ı¯ ı ¯
n|Y|
α∈R+ 2
i=1,...,n y,¯∈Y
y
y∈Y i,¯=1,...,n
ı
subject to, for i = 1, . . . , n,
C
αiy ≤ .
N
y∈Y
too many variables: train with working set of αiy .
Kernelized prediction function:
f (x) = argmax αiy k( (xi , yi ), (x, y) )
y∈Y
iy
46 / 56
47. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
What do ”joint kernel functions” look like?
k( (x, y) , (¯, y ) ) = φ(x, y), φ(¯, y ) .
x ¯ x ¯
As in graphical model: easier if φ decomposes w.r.t. factors:
φ(x, y) = φF (x, yF ) F ∈F
Then the kernel k decomposes into sum over factors:
k( (x, y) , (¯, y ) ) =
x ¯ φF (x, yF ) F ∈F
, φF (x , yF ) F ∈F
= φF (x, yF ), φF (x , yF )
F ∈F
= kF ( (x, yF ), (x , yF ) )
F ∈F
We can define kernels for each factor (e.g. nonlinear).
47 / 56
48. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Example: figure-ground segmentation with grid structure
(x, y)=(
ˆ , )
Typical kernels: arbirary in x, linear (or at least simple) w.r.t. y:
Unary factors:
kp ((xp , yp ), (xp , yp ) = k(xp , xp ) yp = yp
with k(xp , xp ) local image kernel, e.g. χ2 or histogram intersection
Pairwise factors:
kpq ((yp , yq ), (yp , yp ) = yq = yq yq = yq
More powerful than all-linear, and argmax-prediction still possible.
48 / 56
49. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Example: object localization
left top
(x, y)=(
ˆ , )
image
right bottom
Only one factor that includes all x and y:
k( (x, y) , (x , y ) ) = kimage (x|y , x |y )
with kimage image kernel and x|y is image region within box y.
argmax-prediction as difficult as object localization with kimage -SVM.
49 / 56
50. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Summary – S-SVM Learning
Given:
training set {(x1 , y 1 ), . . . , (xn , y n )} ⊂ X × Y
loss function ∆ : Y × Y → R.
Task: learn parameter w for f (x) := argmaxy w, φ(x, y) that minimizes
expected loss on future data.
50 / 56
51. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Summary – S-SVM Learning
Given:
training set {(x1 , y 1 ), . . . , (xn , y n )} ⊂ X × Y
loss function ∆ : Y × Y → R.
Task: learn parameter w for f (x) := argmaxy w, φ(x, y) that minimizes
expected loss on future data.
S-SVM solution derived by maximum margin framework:
enforce correct output to be better than others by a margin :
w, φ(xn , y n ) ≥ ∆(y n , y) + w, φ(xn , y) for all y ∈ Y.
convex optimization problem, but non-differentiable
many equivalent formulations → different training algorithms
training needs repeated argmax prediction, no probabilistic inference
51 / 56
52. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Extra I: Beyond Fully Supervised Learning
So far, training was fully supervised, all variables were observed.
In real life, some variables are unobserved even during training.
missing labels in training data latent variables, e.g. part location
latent variables, e.g. part occlusion latent variables, e.g. viewpoint
52 / 56
53. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Three types of variables:
x ∈ X always observed,
y ∈ Y observed only in training,
z ∈ Z never observed (latent).
Decision function: f (x) = argmaxy∈Y maxz∈Z w, φ(x, y, z)
53 / 56
54. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Three types of variables:
x ∈ X always observed,
y ∈ Y observed only in training,
z ∈ Z never observed (latent).
Decision function: f (x) = argmaxy∈Y maxz∈Z w, φ(x, y, z)
Maximum Margin Training with Maximization over Latent Variables
N
1 2 C
Solve: min w + ξn
w,ξ 2 N
n=1
subject to, for n = 1, . . . , N , for all y ∈ Y
∆(y n , y) + max w, φ(xn , y, z) − max w, φ(xn , y n , z)
z∈Z z∈Z
Problem: not a convex problem → can have local minima
[C. Yu, T. Joachims, ”Learning Structural SVMs with Latent Variables”, ICML, 2009]
similar idea: [Felzenszwalb, McAllester, Ramaman. A Discriminatively Trained, Multiscale, Deformable Part Model, CVPR’08]
54 / 56
55. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Structured Learning is full of Open Research Questions
How to train faster?
CRFs need many runs of probablistic inference,
SSVMs need many runs of argmax-predictions.
How to reduce the necessary amount of training data?
semi-supervised learning? transfer learning?
How can we better understand different loss function?
when to use probabilistic training, when maximum margin?
CRFs are “consistent”, SSVMs are not. Is this relevant?
Can we understand structured learning with approximate inference?
often computing L(w) or argmaxy w, φ(x, y) exactly is infeasible.
can we guarantee good results even with approximate inference?
More and new applications!
55 / 56
56. Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs
Lunch-Break
Continuing at 13:30
Slides available at
http://www.nowozin.net/sebastian/
cvpr2011tutorial/
56 / 56