1) The document presents a new model called Interpretable Reinforced Learning Counting (IRLC) for visual question answering that focuses on counting objects in natural images. 2) IRLC uses an object detector to ground question words to image regions, then uses an LSTM to incrementally count the number of detected objects while generating an interpretable counting process. 3) Experimental results on the VQA2.0, Visual Genome, and HowMany-QA datasets show that IRLC outperforms previous counting models like Softcount and UpDown in terms of accuracy and root mean squared error on counting questions.