SlideShare a Scribd company logo
Zurich Universities of Applied Sciences and Arts
Learning End-to-End
Thilo Stadelmann & Thierry Musy
Datalab Seminar, 24. June 2015, Winterthur
Zurich Universities of Applied Sciences and Arts
DISCLAIMER
Today’s talk is about the basic ideas of a single, inspiring,
industry-proven paper from the nineties…
LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, 1998
Zurich Universities of Applied Sciences and Arts
Agenda
Challenge
The bigger picture of ML – Sequential Supervised Learning as an example where
simplistic paradigms fail
Proposed Solution
Global learning – Graph Transformer Networks – Example: Digit string recognition
with heuristic oversegmentation – Advantage of discriminative training
Use Case
Error propagation through multiple modules – Check reading application
Conclusions
Summary – Outlook to other projects – Remarks
Zurich Universities of Applied Sciences and Arts
CHALLENGE: SEQUENTIAL SUPERVISED LEARNING
Zurich Universities of Applied Sciences and Arts
Machine Learning
Wikipedia on «Learning», 2015:
«...the act of acquiring new, or modifying and reinforcing, existing knowledge,
behaviors, skills, values, or preferences and may involve synthesizing
different types of information.»
A. Samuel, 1959:
«…gives computers the ability to learn without being explicitly
programmed.»
T.M. Mitchell, 1997:
«…if its performance at tasks in T, as measured by P, improves with
experience E.»
 In practice: Fitting parameters of a function to a set of data.
(data usually handcrafted, function chosen heuristically)
5
“do something”
Zurich Universities of Applied Sciences and Arts
A landmark work in the direction of «learning»
LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, 1998
Outline
• Gradient-Based ML 
• Convolutional Neural Networks 
• Comparison with other Methods
• Multi-Module Systems &
Graph Transformer Networks
• Multiple Object Recognition &
Heuristic Oversegmentation
• Space Displacement Neural
Networks
• GTN’s as General Transducers
• On-Line Handwriting Recognition
System
• Check Reading System
6
Zurich Universities of Applied Sciences and Arts
Standard and Sequential Supervised Learning
Supervised Learning
Typical assumption on data:
• i.i.d.
• Surrounding tasks deemed simple(r)
Sequential Supervised Learning
Typical assumptions on data:
• Sequence information matters
• Overall task has many challenging
components (e.g., segmentation 
recognition  sequence assembly)
7
feature vectors labels
Zurich Universities of Applied Sciences and Arts
Approaches to classifying sequential data
«A bird in the hand…» approach
• Train standard classifier, extend it using a sliding window and post-processing
(e.g., smoothing)
Direct modeling approach
• Train a generative (statistical) model of the sequence generation process (e.g.,
HMM)
«…two in the bush» approach
• Build a unified pattern recognition processing chain, optimize it globally with a
unique criterion
See also: T.G. Dietterich, «Machine Learning for Sequential
Data – A Review», 2002
8
Zurich Universities of Applied Sciences and Arts
Images sources: See references on last slide
PROPOSED SOLUTION: GLOBAL LEARNING
Zurich Universities of Applied Sciences and Arts
Example: Reading handwritten strings
Challenge
• Identify correct character string («342») on a piece of paper
• Therefore: Find correct segmentation & recognize individual characters
10
Zurich Universities of Applied Sciences and Arts
Global Learning
Learning end-to-end
What we know: Traditional pattern
recognition system architecture
What we want: Train all parameters to
optimize a global performance criterion
Zurich Universities of Applied Sciences and Arts
Foundation: Gradient-based learning
A trainable system composed of
heterogeneous modules:
Backpropagation can be used if...
• cost (loss) function is differentiable
w.r.t. parameters
• modules are differentiable w.r.t.
parameters
 Gradient-based learning is the
unifying concept behind many
machine learning methods
 Object-oriented design approach:
Each module is a class with a
fprop() and bprop() method
Zurich Universities of Applied Sciences and Arts
Graph Transformer Networks
Network of pattern recognition modules that successively refine graph
representations of the input
GTNs
• Operate on graphs (b) of the input
instead fixed-size vectors (a)
• Graph: DAG with numerical information
(“penalties”) at the arcs
 GTN takes gradients w.r.t. module parameters and numerical data at input arcs
13
8.9
2.1 15.3 0.8 1.7
0.3
…
…
…
Zurich Universities of Applied Sciences and Arts
Example: Heuristic oversegmentation
…for reading handwritten strings
14
Contains all possible segmentations
(arcs: penalties & images)
Recognizes individual characters
on single images (e.g. CNN)
Combined penalties
 handled via integrated training
Select path with least cum. penalty
For each arc in Gseg: Create new
one per (character) class, penalty
and label attached
Sum up all penalties
Loss Function: Average penalty
(over all training data) of best (i.e.,
lowest penalty) correct path (i.e.
associated with correct sequence)
Zurich Universities of Applied Sciences and Arts
Gradient w.r.t to loss function: 1
Gradient: 1 (for all arcs
appearing in Viterbi path),
0 otherwise
Discriminative training wins
«Viterbi» training Discriminative training
15
Update: Ground Truth
Loss function value:
L = Ccvit
Gradient: like Viterbi
Transformer
Gradient: Backprop for NN
(one Instance per arc)
Problems:
1. Trivial solution possible (Recognizer ignores input & sets all outputs to small values)
2. Penalty does not take competing answers into account (i.e., ignores training signals)
Update: Desired best
output (left) vs. best
output (correct or not;
right)
Solved: Discriminative training builds the class-”separating
surfaces rather than modeling individual classes independently of
each other”  L=0 if best path is a correct path.
Loss function value:
L = Ccvit - Cvit
Problem: Produces no
margin as long as
classification is correct (see
paper for solution).
Zurich Universities of Applied Sciences and Arts
Remarks
Discriminative training
• Uses all available training signals
• Utilizes “penalties”, not probabilities
 No need for normalization
 Enforcing normalization is
“complex, inefficient, time
consuming, ill-conditions the loss
function” [according to paper]
• Is the easiest/direct way to achieve
the objective of classification (as
opposed to Generative training,
that solves the more complex density
estimation task as an intermediary
result)
List of possible GT modules
• All building blocks of (C)NNs
(layers, nonlinearity etc.)
• Multiplexer (though not
differentiable w.r.t. to switching input)
 can be used to dynamically rewire
GTN architecture per input
• min-function (though not
differentiable everywhere)
• Loss function
16
Zurich Universities of Applied Sciences and Arts
EXAMPLE: CHECK READER APPLICATION
Zurich Universities of Applied Sciences and Arts
What is the amount?
18
Courtesy amount
Legal amount
easy for human
but time consuming
interest in automating the process
Zurich Universities of Applied Sciences and Arts
A Check Reading System
19
$ 6.00Goal: Successfully read the amount of
Zurich Universities of Applied Sciences and Arts
A Check Reading System
• NCR
• Reading millions of checks
per day
• GTN-based
• Aimed performance:
• 50% correct
• 49% reject
• 1% error
20
Zurich Universities of Applied Sciences and Arts
Check graph
• Input: trivial graph with a single arc that carries
the image of the whole check
21
Zurich Universities of Applied Sciences and Arts
Field location
Transformer
• Performs classical image analysis (e.g. Connected
component analysis, Ink density histogram, Layout
analysis)
• Heuristically extract rectangluar zones which may content
the amount.
Field Graph
22
Zurich Universities of Applied Sciences and Arts
Segmentation
Transformer
• Performs heuristic image processing techniques (here hit
an deflect) to find candidate cuts.
Field Graph
23
Zurich Universities of Applied Sciences and Arts
Recognition
Transformer
• Iterates over all arcs in the segmentation graph and runs a
character recognizer [here: LeNet5 with 96 classes
(ASCII: 95, rubbish: 1)]
Field Graph
24
„5“ 0.2
„6“ 0.1
„7“ 0.5
CNN
Zurich Universities of Applied Sciences and Arts
Composition
Transformer
• Takes two graphs, the recognition graph and a grammar
graph, to find valid representation of check amounts.
Field Graph
25
Composition
Graph with all
well-formed
amounts
Zurich Universities of Applied Sciences and Arts
Composition
26
Zurich Universities of Applied Sciences and Arts
Viterbi
Transformer
• Selects the path with the lowest accumulated penalty.
Field Graph
27
Zurich Universities of Applied Sciences and Arts
Learning
• Every transformer need to be differentiable
• Each stage of the system has tunable parameters
• Initialized with reasonable values
• Trained with whole checks labeled with the correct amount
• Loss function E minimized by Discriminative Forward
criterion:
L = Ccvit - Cvit
28
Zurich Universities of Applied Sciences and Arts
Discriminative Forward
• Difference between:
• Path with the lowest sum of penalties and the correct Seq
• Path with the lowest sum of penalties and the allowed Seq
29
Zurich Universities of Applied Sciences and Arts
REMARKS / OUTLOOK
Zurich Universities of Applied Sciences and Arts
Conclusions
Less need for manual labeling
• Ground truth only needed for final result
(not for every intermediate result like e.g.
segmentation)
Early errors can be adjusted later due to…
• …unified training of all pattern recognition modules under one regime
• …postponing hard decisions until the very end
No call upon probability theory for modeling / justification
• Occam’s razor: Choose easier discriminative model over generative one
Vapnik: Don’t solve a more complex problem than necessary
• No need for normalization when dealing with “penalties”
instead of probabilities  no “other class” examples needed
• Less constrains on system architecture and module selection
31
Example:Learningtosegmentwithout
intermediatelabels
Zurich Universities of Applied Sciences and Arts
Further Reading
• Original short paper: Bottou, Bengio & LeCun, “Global Training of Docu-
ment Processing Systems using Graph Transformer Networks”, 1997
http://www.iro.umontreal.ca/~lisa/pointeurs/bottou-lecun-bengio-97.pdf
• Landmark long paper: LeCun, Bottou, Bengio & Haffner, “Gradient-
Based Learning Applied to Document Recognition”, 1998
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
• Slide set by the original authors: Bottou, “Graph Transformer Networks”,
2001
http://leon.bottou.org/talks/gtn
• Overview: Dietterich, “Machine Learning for Sequential Data: A Review”, 2002
http://eecs.oregonstate.edu/~tgd/publications/mlsd-ssspr.pdf
• Recent work: Collobert, “Deep Learning for Efficient Discriminative Parsing”, 2011
http://ronan.collobert.com/pub/matos/2011_parsing_aistats.pdf

More Related Content

What's hot

Causative Adversarial Learning
Causative Adversarial LearningCausative Adversarial Learning
Causative Adversarial Learning
David Dao
 
雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超
雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超
雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超
台灣資料科學年會
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
Christopher Wilson
 
Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)
Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)
Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)
Data Con LA
 
ICST 2017 Day1 Opening Ceremony Research Track
ICST 2017 Day1 Opening Ceremony Research TrackICST 2017 Day1 Opening Ceremony Research Track
ICST 2017 Day1 Opening Ceremony Research Track
Hironori Washizaki
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
Edge AI and Vision Alliance
 
Demystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceDemystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial Intelligence
EPCC, University of Edinburgh
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Sangath babu
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with Python
Javier Arias Losada
 
Overview of Machine Learning and its Applications
Overview of Machine Learning and its ApplicationsOverview of Machine Learning and its Applications
Overview of Machine Learning and its Applications
Deepak Chawla
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
Vaibhav Singh
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interaction
fridolin.wild
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
Marina Santini
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
Manuel Martín
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
Gianmario Spacagna
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
Jonathan Mugan
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 
Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation Systems
Vaibhav Singh
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
台灣資料科學年會
 
Himansu sahoo resume-ds
Himansu sahoo resume-dsHimansu sahoo resume-ds
Himansu sahoo resume-ds
Himansu Sahoo
 

What's hot (20)

Causative Adversarial Learning
Causative Adversarial LearningCausative Adversarial Learning
Causative Adversarial Learning
 
雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超
雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超
雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)
Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)
Gradient Boosting Machines (GBM): from Zero to Hero (with R and Python code)
 
ICST 2017 Day1 Opening Ceremony Research Track
ICST 2017 Day1 Opening Ceremony Research TrackICST 2017 Day1 Opening Ceremony Research Track
ICST 2017 Day1 Opening Ceremony Research Track
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
 
Demystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceDemystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial Intelligence
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with Python
 
Overview of Machine Learning and its Applications
Overview of Machine Learning and its ApplicationsOverview of Machine Learning and its Applications
Overview of Machine Learning and its Applications
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interaction
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation Systems
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
 
Himansu sahoo resume-ds
Himansu sahoo resume-dsHimansu sahoo resume-ds
Himansu sahoo resume-ds
 

Viewers also liked

Der Wert von Daten in Zeiten von "Big Data"
Der Wert von Daten in Zeiten von "Big Data"Der Wert von Daten in Zeiten von "Big Data"
Der Wert von Daten in Zeiten von "Big Data"
Thilo Stadelmann
 
Data Science - (K)eine Teenagerliebe
Data Science - (K)eine TeenagerliebeData Science - (K)eine Teenagerliebe
Data Science - (K)eine Teenagerliebe
Thilo Stadelmann
 
Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...
Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...
Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...
Thilo Stadelmann
 
Was denken denkende Maschinen?
Was denken denkende Maschinen?Was denken denkende Maschinen?
Was denken denkende Maschinen?
Thilo Stadelmann
 
Vehicle Technology Return on Investment
Vehicle Technology Return on InvestmentVehicle Technology Return on Investment
Vehicle Technology Return on Investment
Joel Beal
 
Take the Guesswork Out of Measuring ROI for Sales Training
Take the Guesswork Out of Measuring ROI for Sales TrainingTake the Guesswork Out of Measuring ROI for Sales Training
Take the Guesswork Out of Measuring ROI for Sales Training
Corporate Visions
 
Technology Trends Return on Investment
Technology Trends Return on InvestmentTechnology Trends Return on Investment
Technology Trends Return on Investment
Zylog Systems Canada Ltd
 
Attain Superior Sales Performance Through Insight Driven Oracle Sales Analytics
Attain Superior Sales Performance Through Insight Driven Oracle Sales AnalyticsAttain Superior Sales Performance Through Insight Driven Oracle Sales Analytics
Attain Superior Sales Performance Through Insight Driven Oracle Sales Analytics
Jerome Leonard
 
Return on Investment from IssueTrak Software
Return on Investment from IssueTrak SoftwareReturn on Investment from IssueTrak Software
Return on Investment from IssueTrak Software
hdicapitalarea
 
"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware
"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware
"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware
investorrelation
 
The Readiness of ADF Essentials for Public-facing Web Applications
The Readiness of ADF Essentials for Public-facing Web ApplicationsThe Readiness of ADF Essentials for Public-facing Web Applications
The Readiness of ADF Essentials for Public-facing Web Applications
LansenConsulting
 
Closing the Gap on ROI Measurement - Spur Interactive, Steve Interactive
Closing the Gap on ROI Measurement - Spur Interactive, Steve InteractiveClosing the Gap on ROI Measurement - Spur Interactive, Steve Interactive
Closing the Gap on ROI Measurement - Spur Interactive, Steve Interactive
Online Marketing Summit
 
Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...
Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...
Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...
eG Innovations
 
Justifying capacity management by demonstrating the return on investment
Justifying capacity management by demonstrating the return on investmentJustifying capacity management by demonstrating the return on investment
Justifying capacity management by demonstrating the return on investment
Metron
 
Oracle EPM Day Boston - Improving Performance with Enhanced Insight into Pro...
Oracle EPM Day Boston -  Improving Performance with Enhanced Insight into Pro...Oracle EPM Day Boston -  Improving Performance with Enhanced Insight into Pro...
Oracle EPM Day Boston - Improving Performance with Enhanced Insight into Pro...
Alithya
 
Oracle Insight - Cloud Events Presentation V5 LinkedIn
Oracle Insight - Cloud Events Presentation V5 LinkedInOracle Insight - Cloud Events Presentation V5 LinkedIn
Oracle Insight - Cloud Events Presentation V5 LinkedIn
Ayman Eid
 
Enhancing Return On Investment
Enhancing Return On InvestmentEnhancing Return On Investment
Enhancing Return On Investment
Akash Dolas,PMP [Open Networker]
 
Advancing Return on Investment Analysis for Government IT: A Public Value Fra...
Advancing Return on Investment Analysis for Government IT: A Public Value Fra...Advancing Return on Investment Analysis for Government IT: A Public Value Fra...
Advancing Return on Investment Analysis for Government IT: A Public Value Fra...
FindWhitePapers
 
Sales ROI Benchmarking
Sales ROI BenchmarkingSales ROI Benchmarking
Sales ROI Benchmarking
dreamforce2006
 
ROI: Reality or Illusion
ROI: Reality or IllusionROI: Reality or Illusion
ROI: Reality or Illusion
Chris Fritsch, JD
 

Viewers also liked (20)

Der Wert von Daten in Zeiten von "Big Data"
Der Wert von Daten in Zeiten von "Big Data"Der Wert von Daten in Zeiten von "Big Data"
Der Wert von Daten in Zeiten von "Big Data"
 
Data Science - (K)eine Teenagerliebe
Data Science - (K)eine TeenagerliebeData Science - (K)eine Teenagerliebe
Data Science - (K)eine Teenagerliebe
 
Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...
Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...
Wie die Swiss Alliance for Data-Intensive Services datenbasierte Mehrwerte sc...
 
Was denken denkende Maschinen?
Was denken denkende Maschinen?Was denken denkende Maschinen?
Was denken denkende Maschinen?
 
Vehicle Technology Return on Investment
Vehicle Technology Return on InvestmentVehicle Technology Return on Investment
Vehicle Technology Return on Investment
 
Take the Guesswork Out of Measuring ROI for Sales Training
Take the Guesswork Out of Measuring ROI for Sales TrainingTake the Guesswork Out of Measuring ROI for Sales Training
Take the Guesswork Out of Measuring ROI for Sales Training
 
Technology Trends Return on Investment
Technology Trends Return on InvestmentTechnology Trends Return on Investment
Technology Trends Return on Investment
 
Attain Superior Sales Performance Through Insight Driven Oracle Sales Analytics
Attain Superior Sales Performance Through Insight Driven Oracle Sales AnalyticsAttain Superior Sales Performance Through Insight Driven Oracle Sales Analytics
Attain Superior Sales Performance Through Insight Driven Oracle Sales Analytics
 
Return on Investment from IssueTrak Software
Return on Investment from IssueTrak SoftwareReturn on Investment from IssueTrak Software
Return on Investment from IssueTrak Software
 
"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware
"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware
"Oracle Insight for Investors" Educational Webcast - Oracle Fusion Middleware
 
The Readiness of ADF Essentials for Public-facing Web Applications
The Readiness of ADF Essentials for Public-facing Web ApplicationsThe Readiness of ADF Essentials for Public-facing Web Applications
The Readiness of ADF Essentials for Public-facing Web Applications
 
Closing the Gap on ROI Measurement - Spur Interactive, Steve Interactive
Closing the Gap on ROI Measurement - Spur Interactive, Steve InteractiveClosing the Gap on ROI Measurement - Spur Interactive, Steve Interactive
Closing the Gap on ROI Measurement - Spur Interactive, Steve Interactive
 
Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...
Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...
Desktop Transformation Success - The 5 Secrets to Delivering User Satisfactio...
 
Justifying capacity management by demonstrating the return on investment
Justifying capacity management by demonstrating the return on investmentJustifying capacity management by demonstrating the return on investment
Justifying capacity management by demonstrating the return on investment
 
Oracle EPM Day Boston - Improving Performance with Enhanced Insight into Pro...
Oracle EPM Day Boston -  Improving Performance with Enhanced Insight into Pro...Oracle EPM Day Boston -  Improving Performance with Enhanced Insight into Pro...
Oracle EPM Day Boston - Improving Performance with Enhanced Insight into Pro...
 
Oracle Insight - Cloud Events Presentation V5 LinkedIn
Oracle Insight - Cloud Events Presentation V5 LinkedInOracle Insight - Cloud Events Presentation V5 LinkedIn
Oracle Insight - Cloud Events Presentation V5 LinkedIn
 
Enhancing Return On Investment
Enhancing Return On InvestmentEnhancing Return On Investment
Enhancing Return On Investment
 
Advancing Return on Investment Analysis for Government IT: A Public Value Fra...
Advancing Return on Investment Analysis for Government IT: A Public Value Fra...Advancing Return on Investment Analysis for Government IT: A Public Value Fra...
Advancing Return on Investment Analysis for Government IT: A Public Value Fra...
 
Sales ROI Benchmarking
Sales ROI BenchmarkingSales ROI Benchmarking
Sales ROI Benchmarking
 
ROI: Reality or Illusion
ROI: Reality or IllusionROI: Reality or Illusion
ROI: Reality or Illusion
 

Similar to Learning End to End

Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
Vincenzo Gulisano
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
민재 정
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Manish Pandey
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 01
8threspecter
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
Mehmet Çelik
 
Segment_1_New computer algorithm for cse.pptx
Segment_1_New computer algorithm for cse.pptxSegment_1_New computer algorithm for cse.pptx
Segment_1_New computer algorithm for cse.pptx
fahmidasetu
 
Mtech syllabus computer science uptu
Mtech syllabus computer science uptu Mtech syllabus computer science uptu
Mtech syllabus computer science uptu
Abhishek Kesharwani
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Pregel
PregelPregel
Pregel
Weiru Dai
 
Analysis and Modelling of CMOS Gm-C Filters through Machine Learning
Analysis and Modelling of CMOS Gm-C Filters through Machine LearningAnalysis and Modelling of CMOS Gm-C Filters through Machine Learning
Analysis and Modelling of CMOS Gm-C Filters through Machine Learning
Malinka Ivanova
 
Alexander Mikov - Program Tools for Dynamic Investigation of Social Networks
Alexander Mikov - Program Tools for Dynamic Investigation of Social NetworksAlexander Mikov - Program Tools for Dynamic Investigation of Social Networks
Alexander Mikov - Program Tools for Dynamic Investigation of Social Networks
AIST
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
 
Compositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsCompositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing Gradients
Roberto Casadei
 
Csci101 lect00 introduction
Csci101 lect00 introductionCsci101 lect00 introduction
Csci101 lect00 introduction
Elsayed Hemayed
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
farazahmad005
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
Muhammad Hammad Waseem
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
 
introduction.pdf
introduction.pdfintroduction.pdf
introduction.pdf
rafayqadeer2
 
DEBS-2023.pdf
DEBS-2023.pdfDEBS-2023.pdf
DEBS-2023.pdf
vschiavoni
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...
Josh Sheldon
 

Similar to Learning End to End (20)

Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 01
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
Segment_1_New computer algorithm for cse.pptx
Segment_1_New computer algorithm for cse.pptxSegment_1_New computer algorithm for cse.pptx
Segment_1_New computer algorithm for cse.pptx
 
Mtech syllabus computer science uptu
Mtech syllabus computer science uptu Mtech syllabus computer science uptu
Mtech syllabus computer science uptu
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Pregel
PregelPregel
Pregel
 
Analysis and Modelling of CMOS Gm-C Filters through Machine Learning
Analysis and Modelling of CMOS Gm-C Filters through Machine LearningAnalysis and Modelling of CMOS Gm-C Filters through Machine Learning
Analysis and Modelling of CMOS Gm-C Filters through Machine Learning
 
Alexander Mikov - Program Tools for Dynamic Investigation of Social Networks
Alexander Mikov - Program Tools for Dynamic Investigation of Social NetworksAlexander Mikov - Program Tools for Dynamic Investigation of Social Networks
Alexander Mikov - Program Tools for Dynamic Investigation of Social Networks
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
Compositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsCompositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing Gradients
 
Csci101 lect00 introduction
Csci101 lect00 introductionCsci101 lect00 introduction
Csci101 lect00 introduction
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
introduction.pdf
introduction.pdfintroduction.pdf
introduction.pdf
 
DEBS-2023.pdf
DEBS-2023.pdfDEBS-2023.pdf
DEBS-2023.pdf
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...
 

Recently uploaded

MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024
MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024
MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024
DK PAGEANT
 
Biography and career history of Bruno Amezcua
Biography and career history of Bruno AmezcuaBiography and career history of Bruno Amezcua
Biography and career history of Bruno Amezcua
Bruno Amezcua
 
thrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trendsthrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trends
amarshifan555
 
Self-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain VictorySelf-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain Victory
bluetroyvictorVinay
 
Insanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete GuideInsanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete Guide
Trending Blogers
 
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and MoreTypes of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Affordable Garage Door Repair
 
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANEMRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
DK PAGEANT
 
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDFAnalysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
JoshuaDagama1
 
The Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the NightThe Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the Night
thomasard1122
 
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
lyurzi7r
 
Capsule Wardrobe Women: A document show
Capsule Wardrobe Women:  A document showCapsule Wardrobe Women:  A document show
Capsule Wardrobe Women: A document show
mustaphaadeyemi08
 

Recently uploaded (11)

MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024
MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024
MISS TEEN LUCKNOW 2024 - WINNER ASIYA 2024
 
Biography and career history of Bruno Amezcua
Biography and career history of Bruno AmezcuaBiography and career history of Bruno Amezcua
Biography and career history of Bruno Amezcua
 
thrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trendsthrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trends
 
Self-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain VictorySelf-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain Victory
 
Insanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete GuideInsanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete Guide
 
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and MoreTypes of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and More
 
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANEMRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
 
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDFAnalysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
 
The Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the NightThe Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the Night
 
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
 
Capsule Wardrobe Women: A document show
Capsule Wardrobe Women:  A document showCapsule Wardrobe Women:  A document show
Capsule Wardrobe Women: A document show
 

Learning End to End

  • 1. Zurich Universities of Applied Sciences and Arts Learning End-to-End Thilo Stadelmann & Thierry Musy Datalab Seminar, 24. June 2015, Winterthur
  • 2. Zurich Universities of Applied Sciences and Arts DISCLAIMER Today’s talk is about the basic ideas of a single, inspiring, industry-proven paper from the nineties… LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, 1998
  • 3. Zurich Universities of Applied Sciences and Arts Agenda Challenge The bigger picture of ML – Sequential Supervised Learning as an example where simplistic paradigms fail Proposed Solution Global learning – Graph Transformer Networks – Example: Digit string recognition with heuristic oversegmentation – Advantage of discriminative training Use Case Error propagation through multiple modules – Check reading application Conclusions Summary – Outlook to other projects – Remarks
  • 4. Zurich Universities of Applied Sciences and Arts CHALLENGE: SEQUENTIAL SUPERVISED LEARNING
  • 5. Zurich Universities of Applied Sciences and Arts Machine Learning Wikipedia on «Learning», 2015: «...the act of acquiring new, or modifying and reinforcing, existing knowledge, behaviors, skills, values, or preferences and may involve synthesizing different types of information.» A. Samuel, 1959: «…gives computers the ability to learn without being explicitly programmed.» T.M. Mitchell, 1997: «…if its performance at tasks in T, as measured by P, improves with experience E.»  In practice: Fitting parameters of a function to a set of data. (data usually handcrafted, function chosen heuristically) 5 “do something”
  • 6. Zurich Universities of Applied Sciences and Arts A landmark work in the direction of «learning» LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, 1998 Outline • Gradient-Based ML  • Convolutional Neural Networks  • Comparison with other Methods • Multi-Module Systems & Graph Transformer Networks • Multiple Object Recognition & Heuristic Oversegmentation • Space Displacement Neural Networks • GTN’s as General Transducers • On-Line Handwriting Recognition System • Check Reading System 6
  • 7. Zurich Universities of Applied Sciences and Arts Standard and Sequential Supervised Learning Supervised Learning Typical assumption on data: • i.i.d. • Surrounding tasks deemed simple(r) Sequential Supervised Learning Typical assumptions on data: • Sequence information matters • Overall task has many challenging components (e.g., segmentation  recognition  sequence assembly) 7 feature vectors labels
  • 8. Zurich Universities of Applied Sciences and Arts Approaches to classifying sequential data «A bird in the hand…» approach • Train standard classifier, extend it using a sliding window and post-processing (e.g., smoothing) Direct modeling approach • Train a generative (statistical) model of the sequence generation process (e.g., HMM) «…two in the bush» approach • Build a unified pattern recognition processing chain, optimize it globally with a unique criterion See also: T.G. Dietterich, «Machine Learning for Sequential Data – A Review», 2002 8
  • 9. Zurich Universities of Applied Sciences and Arts Images sources: See references on last slide PROPOSED SOLUTION: GLOBAL LEARNING
  • 10. Zurich Universities of Applied Sciences and Arts Example: Reading handwritten strings Challenge • Identify correct character string («342») on a piece of paper • Therefore: Find correct segmentation & recognize individual characters 10
  • 11. Zurich Universities of Applied Sciences and Arts Global Learning Learning end-to-end What we know: Traditional pattern recognition system architecture What we want: Train all parameters to optimize a global performance criterion
  • 12. Zurich Universities of Applied Sciences and Arts Foundation: Gradient-based learning A trainable system composed of heterogeneous modules: Backpropagation can be used if... • cost (loss) function is differentiable w.r.t. parameters • modules are differentiable w.r.t. parameters  Gradient-based learning is the unifying concept behind many machine learning methods  Object-oriented design approach: Each module is a class with a fprop() and bprop() method
  • 13. Zurich Universities of Applied Sciences and Arts Graph Transformer Networks Network of pattern recognition modules that successively refine graph representations of the input GTNs • Operate on graphs (b) of the input instead fixed-size vectors (a) • Graph: DAG with numerical information (“penalties”) at the arcs  GTN takes gradients w.r.t. module parameters and numerical data at input arcs 13 8.9 2.1 15.3 0.8 1.7 0.3 … … …
  • 14. Zurich Universities of Applied Sciences and Arts Example: Heuristic oversegmentation …for reading handwritten strings 14 Contains all possible segmentations (arcs: penalties & images) Recognizes individual characters on single images (e.g. CNN) Combined penalties  handled via integrated training Select path with least cum. penalty For each arc in Gseg: Create new one per (character) class, penalty and label attached Sum up all penalties Loss Function: Average penalty (over all training data) of best (i.e., lowest penalty) correct path (i.e. associated with correct sequence)
  • 15. Zurich Universities of Applied Sciences and Arts Gradient w.r.t to loss function: 1 Gradient: 1 (for all arcs appearing in Viterbi path), 0 otherwise Discriminative training wins «Viterbi» training Discriminative training 15 Update: Ground Truth Loss function value: L = Ccvit Gradient: like Viterbi Transformer Gradient: Backprop for NN (one Instance per arc) Problems: 1. Trivial solution possible (Recognizer ignores input & sets all outputs to small values) 2. Penalty does not take competing answers into account (i.e., ignores training signals) Update: Desired best output (left) vs. best output (correct or not; right) Solved: Discriminative training builds the class-”separating surfaces rather than modeling individual classes independently of each other”  L=0 if best path is a correct path. Loss function value: L = Ccvit - Cvit Problem: Produces no margin as long as classification is correct (see paper for solution).
  • 16. Zurich Universities of Applied Sciences and Arts Remarks Discriminative training • Uses all available training signals • Utilizes “penalties”, not probabilities  No need for normalization  Enforcing normalization is “complex, inefficient, time consuming, ill-conditions the loss function” [according to paper] • Is the easiest/direct way to achieve the objective of classification (as opposed to Generative training, that solves the more complex density estimation task as an intermediary result) List of possible GT modules • All building blocks of (C)NNs (layers, nonlinearity etc.) • Multiplexer (though not differentiable w.r.t. to switching input)  can be used to dynamically rewire GTN architecture per input • min-function (though not differentiable everywhere) • Loss function 16
  • 17. Zurich Universities of Applied Sciences and Arts EXAMPLE: CHECK READER APPLICATION
  • 18. Zurich Universities of Applied Sciences and Arts What is the amount? 18 Courtesy amount Legal amount easy for human but time consuming interest in automating the process
  • 19. Zurich Universities of Applied Sciences and Arts A Check Reading System 19 $ 6.00Goal: Successfully read the amount of
  • 20. Zurich Universities of Applied Sciences and Arts A Check Reading System • NCR • Reading millions of checks per day • GTN-based • Aimed performance: • 50% correct • 49% reject • 1% error 20
  • 21. Zurich Universities of Applied Sciences and Arts Check graph • Input: trivial graph with a single arc that carries the image of the whole check 21
  • 22. Zurich Universities of Applied Sciences and Arts Field location Transformer • Performs classical image analysis (e.g. Connected component analysis, Ink density histogram, Layout analysis) • Heuristically extract rectangluar zones which may content the amount. Field Graph 22
  • 23. Zurich Universities of Applied Sciences and Arts Segmentation Transformer • Performs heuristic image processing techniques (here hit an deflect) to find candidate cuts. Field Graph 23
  • 24. Zurich Universities of Applied Sciences and Arts Recognition Transformer • Iterates over all arcs in the segmentation graph and runs a character recognizer [here: LeNet5 with 96 classes (ASCII: 95, rubbish: 1)] Field Graph 24 „5“ 0.2 „6“ 0.1 „7“ 0.5 CNN
  • 25. Zurich Universities of Applied Sciences and Arts Composition Transformer • Takes two graphs, the recognition graph and a grammar graph, to find valid representation of check amounts. Field Graph 25 Composition Graph with all well-formed amounts
  • 26. Zurich Universities of Applied Sciences and Arts Composition 26
  • 27. Zurich Universities of Applied Sciences and Arts Viterbi Transformer • Selects the path with the lowest accumulated penalty. Field Graph 27
  • 28. Zurich Universities of Applied Sciences and Arts Learning • Every transformer need to be differentiable • Each stage of the system has tunable parameters • Initialized with reasonable values • Trained with whole checks labeled with the correct amount • Loss function E minimized by Discriminative Forward criterion: L = Ccvit - Cvit 28
  • 29. Zurich Universities of Applied Sciences and Arts Discriminative Forward • Difference between: • Path with the lowest sum of penalties and the correct Seq • Path with the lowest sum of penalties and the allowed Seq 29
  • 30. Zurich Universities of Applied Sciences and Arts REMARKS / OUTLOOK
  • 31. Zurich Universities of Applied Sciences and Arts Conclusions Less need for manual labeling • Ground truth only needed for final result (not for every intermediate result like e.g. segmentation) Early errors can be adjusted later due to… • …unified training of all pattern recognition modules under one regime • …postponing hard decisions until the very end No call upon probability theory for modeling / justification • Occam’s razor: Choose easier discriminative model over generative one Vapnik: Don’t solve a more complex problem than necessary • No need for normalization when dealing with “penalties” instead of probabilities  no “other class” examples needed • Less constrains on system architecture and module selection 31 Example:Learningtosegmentwithout intermediatelabels
  • 32. Zurich Universities of Applied Sciences and Arts Further Reading • Original short paper: Bottou, Bengio & LeCun, “Global Training of Docu- ment Processing Systems using Graph Transformer Networks”, 1997 http://www.iro.umontreal.ca/~lisa/pointeurs/bottou-lecun-bengio-97.pdf • Landmark long paper: LeCun, Bottou, Bengio & Haffner, “Gradient- Based Learning Applied to Document Recognition”, 1998 http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf • Slide set by the original authors: Bottou, “Graph Transformer Networks”, 2001 http://leon.bottou.org/talks/gtn • Overview: Dietterich, “Machine Learning for Sequential Data: A Review”, 2002 http://eecs.oregonstate.edu/~tgd/publications/mlsd-ssspr.pdf • Recent work: Collobert, “Deep Learning for Efficient Discriminative Parsing”, 2011 http://ronan.collobert.com/pub/matos/2011_parsing_aistats.pdf