Be the first to like this
In order to successfully locate and retrieve document images such as technical articles and newspapers, a text localization technique must be employed. The proposed method detects and extracts homogeneous text areas in document images indifferent to font types and size by using connected components analysis to detect blocks of foreground objects. Next, a descriptor that consists of a set of structural features is extracted from the merged blocks and used as input to a trained Support Vector Machines (SVM). Finally, the output of the SVM classifies the block as text or not.