This document presents an end-to-end text recognition system using convolutional neural networks. It uses a simple two-stage framework: (1) a convolutional neural network for text detection and classification, and (2) high-level inference like lexicon search to recognize text in full images or scenes. The system achieves state-of-the-art performance on standard benchmarks like cropped character classification, cropped word recognition, and end-to-end recognition with a lexicon. It provides a simple yet effective approach for scene text recognition using learned features from a two-layer convolutional network.