This document discusses using global learning and graph transformer networks to solve sequential supervised learning problems. It proposes training all modules of a machine learning system end-to-end using backpropagation to optimize a single global criterion, rather than training modules independently. As an example, it describes a system for reading handwritten check amounts that segments, recognizes characters, and assembles the output using a graph transformer network trained discriminatively.