The document presents a methodology for isolated Bangla word recognition and speaker detection using a Semantic Modular Time Delay Neural Network (MTDNN). The authors collected a dataset of 690 audio samples from 5 speakers uttering 46 Bangla words. They used the MTDNN framework to classify words and identify speakers with an accuracy of 82.66%. This represents an improvement over previous work on Bangla speech recognition. The authors conclude the MTDNN approach is effective for Bangla word recognition and speaker detection but recommend expanding the dataset and vocabulary in future work.