This document describes a proposed virtual assistant system that uses visual and voice inputs. The system is designed to perform desktop and internet tasks for users through natural language commands instead of requiring technical knowledge or manual typing. It will use speech recognition to understand voice commands and complete tasks like playing media files, opening applications, taking screenshots, and conducting web searches through interfaces. The proposed system will be a software application that users can download and install on their PCs. It will aim to provide an easy to use interface for non-technical users to control their computers through voice.