This document discusses a system for extracting Punjabi words from machine printed document images. It begins with an introduction to optical character recognition (OCR) and a literature review of prior work recognizing characters of various scripts, including Gurmukhi (the script used for Punjabi). The proposed system first preprocesses images through steps like binarization, noise removal, and skew detection. It then segments images into lines, words, and characters. Features are extracted from character segments and classified using a neural network to recognize the Punjabi words. The system takes scanned documents as input and outputs the extracted Punjabi words with the goal of high accuracy. It was developed using MATLAB and aims to help with processing literature and texts in the Punjabi