This document is a report submitted by Prasun Chakraborty to KIIT Deemed to be University in partial fulfillment of the requirements for a Bachelor's degree in Electronics and Computer Science Engineering. The report describes the development of a desktop assistant application that uses voice commands to complete specified tasks. Python programming language and libraries like OS, datetime, random, PyOWM, PyAutoGUI, Requests, and Twilio.rest are used to develop the voice assistant application.
I am building voice assistant using Python and his different module. this presentation is a overview of main project.
for Voice recognition I use Pyttsx3, Pyaudio and speech recognition module. it have more than 20+ functions, like greetings , telling time and date, weather report , map etc.
Voice Browser,it is a kind of browser that responds with the voice and even takes input from the user through voice and processes the input using standardized VoiceXML.It is W3C certified project.
YouTube Link: https://youtu.be/sHeJgKBaiAI
** Python Certification Training: https://www.edureka.co/python **
This Edureka video on 'Speech Recognition in Python' will cover the concepts of speech recognition module in python with a program using speech recognition to translate speech into text. Following are the topics discussed:
How Speech Recognition Works?
How To Install SpeechRecognition In Python?
Working With Microphones
How To Install Pyaudio In Python?
Use case
Sign Language Recognition based on Hands symbols ClassificationTriloki Gupta
Communication is always having a great impact in every domain and how it is considered the meaning of the thoughts and expressions that attract the researchers to bridge this gap for every living being.
The objective of this project is to identify the symbolic expression through images so that the communication gap between a normal and hearing impaired person can be easily bridged.
Github Link:https://github.com/TrilokiDA/Hand_Sign_Language
I am building voice assistant using Python and his different module. this presentation is a overview of main project.
for Voice recognition I use Pyttsx3, Pyaudio and speech recognition module. it have more than 20+ functions, like greetings , telling time and date, weather report , map etc.
Voice Browser,it is a kind of browser that responds with the voice and even takes input from the user through voice and processes the input using standardized VoiceXML.It is W3C certified project.
YouTube Link: https://youtu.be/sHeJgKBaiAI
** Python Certification Training: https://www.edureka.co/python **
This Edureka video on 'Speech Recognition in Python' will cover the concepts of speech recognition module in python with a program using speech recognition to translate speech into text. Following are the topics discussed:
How Speech Recognition Works?
How To Install SpeechRecognition In Python?
Working With Microphones
How To Install Pyaudio In Python?
Use case
Sign Language Recognition based on Hands symbols ClassificationTriloki Gupta
Communication is always having a great impact in every domain and how it is considered the meaning of the thoughts and expressions that attract the researchers to bridge this gap for every living being.
The objective of this project is to identify the symbolic expression through images so that the communication gap between a normal and hearing impaired person can be easily bridged.
Github Link:https://github.com/TrilokiDA/Hand_Sign_Language
It's a new Windows based application for visually impaired person..!
This application will provides only, mail services for blinds and there's no voice duplications allowed during the user login.
WHAT IS ANDROID? Android is a mobile operating system (OS) based on the Linux kernel and currently developed by Google. With a user interface based on direct manipulation, Android is designed primarily for touchscreen mobile devices such as smartphones and tablet computers, with specialized user interfaces for televisions (Android TV), cars (Android Auto), and wrist watches (Android Wear).
Android is a software stack for mobile devices that includes an operating system, middleware and key applications. Android is a software platform and operating system for mobile devices based on the Linux operating system and developed by Google and the Open Handset Alliance. It allows developers to write managed code in a Java-like language that utilizes Google-developed Java libraries, but does not support programs developed in native code.
hello guys,here is a basic website modal based on online food ordering system which is inspired by ZOMATO, SWIGGY etc and tried to make a web site such like that as a college project.
SV Digital India, Pune teaches the entire syllabus of the entire Stack Web Development Certification Course. Includes Full Web Development and Mains Stack Advance Advance. This helps web developers to become a stack developer course in Pune.
Alexa-An intelligent voice-controlled personal assistant by AMAZONAnusha Deva
A presentation about Alexa which is an intelligent voice enable personal assistant by AMAZON.further it tells about amazon skill set and companion app.Also shows the general architecture of Alexa when used with AMAZON ECHO.The ppt also gives a sample algorithm and ecosystem to make the understanding of the topic better.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
It's a new Windows based application for visually impaired person..!
This application will provides only, mail services for blinds and there's no voice duplications allowed during the user login.
WHAT IS ANDROID? Android is a mobile operating system (OS) based on the Linux kernel and currently developed by Google. With a user interface based on direct manipulation, Android is designed primarily for touchscreen mobile devices such as smartphones and tablet computers, with specialized user interfaces for televisions (Android TV), cars (Android Auto), and wrist watches (Android Wear).
Android is a software stack for mobile devices that includes an operating system, middleware and key applications. Android is a software platform and operating system for mobile devices based on the Linux operating system and developed by Google and the Open Handset Alliance. It allows developers to write managed code in a Java-like language that utilizes Google-developed Java libraries, but does not support programs developed in native code.
hello guys,here is a basic website modal based on online food ordering system which is inspired by ZOMATO, SWIGGY etc and tried to make a web site such like that as a college project.
SV Digital India, Pune teaches the entire syllabus of the entire Stack Web Development Certification Course. Includes Full Web Development and Mains Stack Advance Advance. This helps web developers to become a stack developer course in Pune.
Alexa-An intelligent voice-controlled personal assistant by AMAZONAnusha Deva
A presentation about Alexa which is an intelligent voice enable personal assistant by AMAZON.further it tells about amazon skill set and companion app.Also shows the general architecture of Alexa when used with AMAZON ECHO.The ppt also gives a sample algorithm and ecosystem to make the understanding of the topic better.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Synthesized Speech using a small Microcontrolleriosrjce
IOSR Journal of Electronics and Communication Engineering(IOSR-JECE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of electronics and communication engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electronics and communication engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Demonstration of visual based and audio-based hci systemeSAT Journals
Abstract This paper is an attempt to provide a bird’s eye view to the concept of Human Compute Interaction (HCI). The intention is to focus on the uni-modal architecture of HCI; especially the HCI system based on visual-based and color-based communication channels viz-a-viz color recognition and speech recognition. We have developed a Graphical User Interface (GUI) for the same using MATLAB; one push button assigned for color input (through webcam) and the other push button assigned for speech input (through microphone). In color recognition, primary colors i.e. RGB are detected in frames captured in real time or images uploaded offline. Subsequently, desired operation is executed (we have set commands to open D drive). In speech recognition, audio input through microphone is compared with a pre-stored audio file and then an operation is performed automatically (here, we have set commands to open Google web browser). The respective algorithms of these two processes have been described with flow-charts and snapshots of MATLAB results have been displayed. Keywords: Human Computer Interaction, Uni-Modal Architecture, Color Recognition, Speech Recognition
A virtual personal assistant (VPA) is a software application or program that uses artificial intelligence (AI) technologies to provide various services and assistance to individuals. VPAs are designed to understand natural language commands and queries and perform tasks based on those instructions. They can be accessed through various devices such as smartphones, tablets, smart speakers, and computers.
The primary purpose of a virtual personal assistant is to simplify and streamline everyday tasks for users. They can perform a wide range of functions, including:
Voice Recognition: VPAs can understand and interpret spoken commands and respond accordingly. They utilize advanced voice recognition technology to accurately comprehend user instructions.
Task Management: VPAs can manage tasks, schedules, and reminders. They can create, update, and delete appointments, set alarms, and send notifications to ensure users stay organized and on top of their commitments.
Information Retrieval: VPAs have access to vast amounts of information and can retrieve data from the internet or other sources. They can provide users with weather updates, news briefings, sports scores, stock market information, and more.
Web Browsing and Search: VPAs can perform internet searches on behalf of the user, providing relevant search results and answering questions based on available information.Information Retrieval: VPAs have access to vast amounts of information and can retrieve data from the internet or other sources. They can provide users with weather updates, news briefings, sports scores, stock market information, and more.
Web Browsing and Search: VPAs can perform internet searches on behalf of the user, providing relevant search results and answering questions based on available information.
Personalized Recommendations: VPAs can learn user preferences over time and offer personalized recommendations for various aspects of life, such as music, movies, books, restaurants, and more.
Communication: VPAs can handle communication tasks such as making phone calls, sending text messages, and composing emails. They can also manage contacts and facilitate conference calls.
Smart Home Control: With integration into smart home devices and platforms, VPAs can control connected devices like lights, thermostats, security systems, and appliances, allowing users to manage their homes through voice commands.
Travel Assistance: VPAs can help with travel arrangements, including flight bookings, hotel reservations, and providing information about local attractions, directions, and transportation options.
Language Translation: VPAs equipped with language translation capabilities can assist in translating words, phrases, or sentences between different languages, making them valuable for international travelers or language learners.
Entertainment and Leisure: VPAs can play music, podcasts, audiobooks, and even entertain users with jokes, trivia, or interactive games.
This paper presents the method of applying speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in real time calling system. This technique recognizes spoken input, analyzes and translates it, and finally utters the translation. The major part of Speech translation comes under Natural language processing. Natural language processing is a branch of Artificial Intelligence that deals with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers in both written and spoken contexts using natural human languages instead of computer languages. Speech Translation involves techniques to translate the spoken sentences from one language to another. The major part of speech translation involves Speech Recognition which is the translation of spoken speech to text and identifying the context and linguistic structure of the input speech. In the current scenario, the machine does not identify whether the given word is in past tense or present tense. By using the algorithm, we search for a word to check if it is past or present by searching for the sub strings, as “ed”, ”had”, ”Done”, etc., This paper gives us an idea on working with API’s to translate the input speech to the required output speech and thus increasing the efficiency of Speech Translation in cellular devices and also a mobile application that will help us to monitor all the audios present in mobile device and translate it into required language.
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
The purpose of this research paper is to illustrate the implementation of a Voice Command System. This
system works on the primary input of a user’s voice. Using voice as an input, we were able to convert it to
text using a speech to text engine. The text hence produced was used for query processing and fetching
relevant information. When the information was fetched, it was then converted to speech using speech to
text conversion and the relevant output to the user was given. Additionally, some extra modules were also
implemented which worked on the concept of keyword matching. These included telling time, weather and
notification from social applications.
VOICE COMMAND SYSTEM USING RASPBERRY PIaciijournal
The purpose of this research paper is to illustrate the implementation of a Voice Command System. This
system works on the primary input of a user’s voice. Using voice as an input, we were able to convert it to
text using a speech to text engine. The text hence produced was used for query processing and fetching
relevant information. When the information was fetched, it was then converted to speech using speech to
text conversion and the relevant output to the user was given. Additionally, some extra modules were also
implemented which worked on the concept of keyword matching. These included telling time, weather and
notification from social applications.
Voice Command System Using Raspberry PIaciijournal
The purpose of this research paper is to illustrate the implementation of a Voice Command System. This
system works on the primary input of a user’s voice. Using voice as an input, we were able to convert it to
text using a speech to text engine. The text hence produced was used for query processing and fetching
relevant information. When the information was fetched, it was then converted to speech using speech to
text conversion and the relevant output to the user was given. Additionally, some extra modules were also
implemented which worked on the concept of keyword matching. These included telling time, weather and
notification from social applications.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
1. BUIDING A DESKTOP ASSISTANT THAT USES VOICE COMAAND
AND COMPLETES SPECIFIED TASKS
SUBMITTED TO
KIIT Deemed to be University
In Partial Fulfillment of the Requirement for the Award of
BACHELOR’S DEGREE IN
ELECTRONICS AND COMPUTER SCIENECE ENGINEERING
BY
PRASUN CHAKRABORTY ROLL-1730041
UNDER THE GUIDANCE OF
PROF.CHANDANI KUMARI
SCHOOL OF ELECTRONICS ENGINEERING
KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAR, ODISHA - 751024
2. KIIT Deemed to be University
School of Electronics Engineering
BHUBANESWAR, ODISHA - 751024
CERTIFICATE
This is certify that the report entitled
“BUIDING A DESKTOP ASSISTANT THAT USES VOICE COMAAND AND COMPLETES SPECIFIED TASKS “
BY
PRASUN CHAKRABORTY ROLL-1730041
is a record of bonafied work carried out by them, in the partial fulfillment of the
requirement for the award of degree of Bachelor of Engineering in Electronics and
Computer Science Engineering at KIIT Deemed to be university, Bhubaneswar. This
work is done during year 2020, under your guidance.
Date: / /
(Prof. Guide Name)
CHANDANI KUMARI
3. ACKNOWLEDGEMENTS
The success and final outcome of this project required a lot of guidance and assistance
from Prof. CHANDANI KUMARI and I am extremely privileged to have got this all
along the completion of my project. All that I have done is only due to such supervision
and assistance and I would not forget to to thank her.
I respect and thank Prof. CHANDANI KUMARI for providing me an opportunity to do
the project work from home in this pandemic and giving me all support and guidance
which made me complete the project . I am extremely thankful to her for providing such
a nice support and guidance although she had busy schedule managing the academic
affairs.
PRASUN CHAKRABORTY
4. ABSTARCT
A virtual assistant for desktop also called digital assistant, is an application program
that understands natural language voice commands and complete tasks for the users.
Most of the digital assistants are interacted with by using human voice. They may also
be reffered as voice assistant. To interact with a digital assistant one must use a wake
word, that is used to activate the device . Once one said a wake word, the system is now
ready to be asked a question. One could then ask “whats about the weather” and the
system will forecast the weather in local area aloud .
As digital digital assistant become more popular , so do their capabilities and the task
they are able to perform . Below are few of popular activities this desktop assistant can
perform .
Answer basic questions
Searching Google/Wikipedia
Set alarm, timer
Get information about temperature
Playing a Song
Reading and writing text files & many more..
In this project python programming language is used to develop the application .
5. CONTENT
1. Literature Review……………………………………………1
1.1 An overview of speech recognition……………………………1
1.2 History……………………………………………………..……………….1
2. Types of speech recognition……………….…………..2
2.1 Isolated speech…………………………………………..…….……….2
2.2 Connected speech……………………………………………………..2
2.3 Continuous speech…………………………………………………….2
2.4 Spontaneous speech………………………………………………….2
3. Basic speech recognition process……………………3
4. Introduction to Python…………………………………..4
5. Python libraries used in this Project……………….4
6. Tools required………………………………………………10
7. Use case Diagram………………………………………….10
8. List of tasks this application perform…………….11
9. Uses of speech recognition……………………………16
10. Applications………………………………………………….16
10.1 From medical perspective……………………………………….16
10.2 From military perspective………………………………………..16
10.3 From education perspective…………………………………….16
11. Some factors that may disturb functionalities
of the application………………………………………….17
12. The future of Speech Recognition…………………..17
13. References……………………………………………………..17
6. 1
1. Literature Review
1.1 An overview of Speech Recognition
Speech Recognition is a technology that enables a computer to capture the words
spoken by a human with a help of microphone . These words are later on recognized by
Speech recognizer, and at the end system works according to the voice input .
The process of Speech Recognition consists of different steps that will be discussed in the
following section one by one.
1.2 History
The concept of speech recognition stated somewhere in 1940s, practically the first
speech recognition program was appeared in 1952 at the bell labs, that was about
recognition of digit in a noise free environment.
1940s and 1950s are considered as the foundation period of the speech recognition
technology , in this period work was done on the foundational paradigms of the speech
recognition that is automation and information theoretic models. The key technologies
that were developed in this decade were filter banks and time normalization methods.
In 1990s the key technologies developed during this period were the methods for
stochastic language understanding , statistical learning of acoustic and language models
and the method for implementation of large vocabulary speech understanding systems.
After the five decades of research , the speech recognition technology has finally entered
marketplace , benefiting the users in variety of ways . The challenge of designing a
machine that truly functions like an intelligent human is still a major one going forward.
7. 2
2. Types of Speech Recognition : Speech Recognition systems can be divided
Into the number of classes based on their ability to recognize those words and list of
words they have. A few classes of speech recognition are classified as under :
2.1. Isolated Speech
Isolated word usually involve a pause between two utterance ; it does’nt mean that it
only accepts a single word but instead it requires one utterance at a time.
2.2 Connected Speech
Connected words or connected speech are similar to isolated speech but allow separate
utterance with minimal pause between them .
2.3 Continuous Speech
Continuous speech allow the user to speak almost naturally. It is also called the
computer dictation.
2.4 Spontaneous Speech
At a basic level, it can be the thought of as speech that is natural sounding and not
rehearsed. An ASR system with spontaneous speech ability should be able to handle a
variety of natural speech features such as words being run together ,“ums” and
“ahs”and even slight stutters.
8. 3
3. Basic Speech Recognition Process
Audio Input : With the help of the microphone the audio (human voice )is input to
the system .
Analog to Digital : The process of converting to analog signal into digital form is
known as digitization . it involves both sampling and quantization process.
Acoustic Model : An acoustic model is created by taking audio inputs and their text
transcripts , and using software to create statistical representation of the sounds
that make up each word .
Language Model : Language modeling is used in many natural language processing
applications such as speech recognition tries to capture the properties of language
and to predict the next word in the speech sequence .
Speech Engine : The job of speech engine is to convert the input audio file into text
to accomplish this it uses all sorts of data, software algorithms and statistics .
Output : After all the above steps finally the output comes that is performing
operations according to the voice commands .
Audio Input Analog to Digital Acoustic Model
Output Language ModelSpeech Engine
9. 4
4. Introduction to Python
4.1 Python : Python is an interpreted, high-level, general-purpose programming
language. Created by Guido van Rossum and first released in 1991, Python's design
philosophy emphasizes code readability with its notable use of significant whitespace.
It is a high level programming language which is,
Interpreted: Python is processed as run time by the interpreter .
Interactive: A python prompt can be used and can interact with the interpreter
directly to write the programs .
Object-oriented: Python suppoprts Object oriented technique of programming
Beginner’s language: Python is a great language for the beginner-level
programmers and supports the development of a wide range of applications.
In this project different techniques have been used for different functionalities .
Those will be discussed one by one .
5. Python Libraries used in this project
OS : The OS module in python provides functions for interacting with the
operating system. OS, comes under Python’s standard utility modules. This
module provides a portable way of using operating system dependent functionality.
Date-time : Python has a module named datetime to work with dates and times.
Random : Sometimes we want the computer to pick a random number in a given
range, pick a random element from a list, pick a random card from a deck, flip a
coin, etc. The random module provides access to functions that support these types
of operations. The random module is another library of functions that can extend
the basic features of python.
PyOWM : PyOWM is a client Python wrapper library for Open Weather Map web
APIs. It allows quick and easy consumption of OWM data from Python
applications via a simple object model and in a human-friendly fashion.
10. 5
PyAutoGUI : PyAutoGUI is a cross-platform GUI automation Python module for
human beings. Used to programmatically control the mouse & keyboard.
PyAutoGUI supports Python 2 and 3.
Requests : Requests is an Apache2 Licensed HTTP library, written in Python. It is
designed to be used by humans to interact with the language. This means one don’t
have to manually add query strings to URLs, or form-encode POST data.
Requests will allow one to send HTTP/1.1 requests using Python. With it, one can
add content like headers, form data, multipart files, and parameters via simple
Python libraries. It also allows you to access the response data of Python in the
same way.
Twilio.rest : The Twilio Python Helper Library makes it easy to interact with the
Twilio API from Python application.The Twilio Python Helper Library supports
Python applications written in Python 2.7 and above. Using Twilio API one can
automate sending whatsapp messages , calls , sending verification codes. In our
project Twilio API has been used to send help message to anyone in emergency
situation.
Webbrowser : The webbrowser module provides a high-level interface to allow
displaying Web-based documents to users. Under most circumstances, simply
calling the open() function from this module will open url using the default browser .
One have to import the module and use open() function.
Webbrowser.open(“URL”,new=2)
If new is 0, the url is opened in the same browser window if possible. If new is 1, a
new browser window is opened if possible. If new is 2, a new browser page ("tab") is
opened if possible.
Pyttsx : pyttsx is a cross-platform text to speech library which is platform
independent. The major advantage of using this library for text-to-speech
conversion is that it works offline. However, pyttsx supports only Python 2.x.
Hence, we will see pyttsx3 which is modified to work on both Python 2.x and
Python 3.x with the same code.
11. 6
.
Speech-Recognition : Speech recognition has its roots in research done at Bell Labs
in the early 1950s. Early systems were limited to a single speaker and had limited
vocabularies of about a dozen words. Modern speech recognition systems have
come a long way since their ancient counterparts. They can recognize speech from
multiple speakers and have enormous vocabularies in numerous languages.
The first component of speech recognition is, of course, speech. Speech must be
converted from physical sound to an electrical signal with a microphone, and
then to digital data with an analog-to-digital converter. Once digitized, several
models can be used to transcribe the audio to text.
Most modern speech recognition systems rely on what is known as a Hidden
Markov Model (HMM). This approach works on the assumption that a speech
signal, when viewed on a short enough timescale (say, ten milliseconds), can be
reasonably approximated as a stationary process—that is, a process in which
statistical properties do not change over time.
In a typical HMM, the speech signal is divided into 10-millisecond fragments. The
power spectrum of each fragment, which is essentially a plot of the signal’s
power as a function of frequency, is mapped to a vector of real numbers known
as cepstral coefficients. The dimension of this vector is usually small—sometimes as
low as 10, although more accurate systems may have dimension 32 or more.
The final output of the HMM is a sequence of these vectors.
To decode the speech into text, groups of vectors are matched to one or
more phonemes. A fundamental unit of speech. This calculation requires training,
since the sound of a phoneme varies from speaker to speaker, and even varies from
one utterance to another by the same speaker. A special algorithm is then applied
to determine the most likely word (or words) that produce the given sequence of
phonemes.
One can imagine that this whole process may be computationally expensive. In
many modern speech recognition systems, neural networks are used to simplify the
speech signal using techniques for feature transformation and dimensionality
reduction before HMM recognition. Voice activity detectors (VADs) are also used
to reduce an audio signal to only the portions that are likely to contain speech.
This prevents the recognizer from wasting time analyzing unnecessary parts of the
signal.
12. 7
Wikipedia : The Internet is the single largest source of information, and therefore it
is important to know how to fetch data from various sources. And with Wikipedia
being one of the largest and most popular sources for information on the Internet.
Wikipedia is a multilingual online encyclopedia created and maintained as an open
collaboration project by a community of volunteer editors using a wiki-based
editing system.
Smtplib : Simple Mail Transfer Protocol (SMTP) is a protocol, which handles
sending e-mail and routing e-mail between mail servers.
Python provides smtplib module, which defines an SMTP client session object that
can be used to send mail to any Internet machine with an SMTP or ESMTP
listener daemon.
Here is the detail of the parameters:
Host - This is the host running SMTP server. One can specify IP address of the
host or domain name like facebook.com This is optional argument.
Port - If one are providing host argument , then they need to specify a port where
SMTP server is listening. Usually this port would be 25.
Local-hostname - If one’s SMTP server is running on local machine,then they
can specify just localhost as of this option.
An SMTP object has an instance method called sendmail,which is typically used to
do the work of mailing a message. It takes the parameters-
The sender - A string with the address of the sender.
The receivers - A list of strings , one or each recipient.
The message - A message as a string formatted as a specified in the various
RFCs.
Playsound : The playsound module is the simplest module to use for playing sound.
This module works on both Python 2 and Python 3, and is tested to play wav and
mp3 files only. It contains only one method, named playsound(), with one
argument to take the audio filename for playing.
13. 8
Plyer : Plyer is a Python library for accessing features of hardware / platforms.
Ctypes : Ctypes is a foreign function library for Python. It provides C compatible
data types, and allows calling functions in DLLs or shared libraries. It can be used
to wrap these libraries in pure Python.
Psutil : Psutil is a Python cross-platform library used to access system details and
process utilities. It is used to keep track of various resources utilization in the
system. Usage of resources like CPU, memory, disks, network, sensors can be
monitored. Hence, this library is used for system monitoring, profiling, limiting
process resources and the management of running processes. It is supported in
Python versions 2.6, 2.7 and 3.4+.
Urllib : urllib is a package that collects several modules for working with URLs:
urllib.request for opening and reading URLs
urllib.error containing the exceptions raised by urllib.request
urllib.parse for parsing URLs
urllib.robotparser for parsing robots.txt files
Pyspeedtest : Python library to test network bandwidth using Speedtest.net servers.
One can check ping speed, downloading speed and ping speed using this library .
Pandas : pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool, built on top of the Python programming language.
Matplotlib : Matplotlib is an amazing visualization library in Python for 2D plots
of arrays. Matplotlib is a multi-platform data visualization library built on NumPy
arrays and designed to work with the broader SciPy stack. It was introduced by
John Hunter in the year 2002.
One of the greatest benefits of visualization is that it allows us visual access to huge
amounts of data in easily digestible visuals. Matplotlib consists of several plots like
line, bar, scatter, histogram etc.
14. 9
Beautifulsoup : Beautiful Soup is a Python package for parsing HTML and XML
documents (including having malformed markup, i.e. non-closed tags, so named
after tag soup). It creates a parse tree for parsed pages that can be used to extract
data from HTML, which is useful for web scraping.
Tabulate : Pretty-print tabular data in Python, a library and a command-line
utility.
The main use cases of the library are:
printing small tables without hassle: just one function call, formatting is
guided by the data itself
authoring tabular data for lightweight plain-text markup: multiple output
formats suitable for further editing or transformation
readable presentation of mixed textual and numeric data: smart column
alignment, configurable number formatting, alignment by a decimal point
Numpy : NumPy is a python library used for working with arrays. It also has
functions for working in domain of linear algebra, fourier transform, and
matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source
project and can be used freely. NumPy stands for Numerical Python.
Opencv : OpenCV-Python is a library of Python bindings designed to solve
computer vision problems. ... OpenCV-Python makes use of Numpy, which is a
highly optimized library for numerical operations with a MATLAB-style syntax.
All the OpenCV array structures are converted to and from Numpy arrays.
It is also a free open source library used in real-time image processing. It's used
to process images, videos, and even live streams too.
Wave : The wave module in Python's standard library is an easy interface to the
audio WAV format. The functions in this module can write audio data in raw
format to a file like object and read the attributes of a WAV file.
15. 10
6. Tools Required
Hardware : Monitor/Display
Software : Windows 10
Visual Studio Code(IDE)
Google-Chrome browser
Python version 3.7
7. Diagrams
16. 11
8. List of tasks This application perform
When this application is executed user has to set some input fields ,
(i) That is target whatsapp number and the body of the message . Now here question
arises why these has to be set at the initial point , so an emergency alarming
feature has been added to this desktop assistant. To ellaborate this feature let’s take an
example below.
Eg. Suppose one person goes out of his home for any purpose and in the road their
he faces something wrong happening with him, suppose some people are trying to
attack him or trying to force him to do something or to take him with them ,means
any kind of disturbing situation. He finds that there is nobody to help him in that
situation , he shouts “help me, help me please! ” but there is nobody to help but using
this emergency alarming features he can beg help direct from police station or
hospitals or any kind of emergency services , may be there are no people to listen his
cry but there is his voice assistant running on his laptop or mobile phone and when he
shouts “help me !” his digital assistant listen to him, recognizes his voice and sends a
preset message for help to any emergency service within few seconds. It looks good
now right !! Now he can inform anyone about his problem without calling or texting
anyone touching his mobile phone , he can beg help only using his voice command .
So what he needs to do to activate this feature in emergency is first to set all the
inputs those are the target whatsapp number (the number he wants to send message to
inform aboout his problem )and the message of the body(here suppose the message he
wants to send to police station ), so when all these are set now his digital assistant is
ready to help him out of his home too. The best format of writing the message body is
<Myself XYZ, My address is UXV , My contact number is XXXXXXXXX, I came to
market , now I am in trouble please help me !!>
All these things he has to set before going outside ,so that in critical situation using
only “help me !” command message can be sent because in a disturbing situation he
would’nt get much time to deliver his all details to be traced out.
[**NOTE : To successfully implement this feature the target whatsapp number should
be registered with twiliio account because in this project twilio API has been used for
this purpose.**]
18. 13
(ii) When the application starts it shows birthday reminder if any of user’s friends or
known person has their birthday on that day or not . User just has to set the date of
birth of anybody in the application, if the current date matches with the preset date
then it will show birthday wish reminder otherwis it will push a notification as shown
in the image below.
More Tasks and their Commands
Voice Command Task Description
Ok bro Activation poitn for speech recognition
Jarvis Check if jarvis is listening user or not
Who are you ? Tells the sytem name that is ‘Jarvis’
tell me something about you Gives basic description about itself
temperature Forecast the current temperature of air
check my battery status Gives the battery details
check the connection status Says if user is connected to internet or not
and pushes notification
check internet speed Checks and tells upload , download and
ping speed of user’s network
<user’s query>wikipedia Surfing to wikipedia, shows and speaks
the result aloud
google search <query> Opens google chrome browser and
displays all the possible results for the
query
google Opens google for user
19. 14
Voice Command Task Description
google maps Opens google map
google drive Opens google drive for user
google translate Opens google translate for user
find location<place_name> Searches the place on google map and
shows result in browser
open youtube Opens youtube homepage
search youtube<query> Searches and shows all the possible
contents for user’s query
open udemy Opens udemy homepage
search udemy<course_name> Shows all the possible course
find geeksforgeeks<subject> Opens geeksforgeeks and shows possible
results
open mail Open user’s personal gmail id account
send mail
<message_body><target_mail_id>
Sends mail to anyone using voice
command
open whatsapp Opens whatsapp for user
open facebook Opens facebook for user
find facebook<query> Find people in facebook
search live train status<train_no> Shows live status for user provided train
number
open zoom Opens zoom application
open sublime text Opens sublime text editor
open calculator Opens calculator app
open notepad Opens notepad in desktop
handle file<mode_to_handle> Read write and save text files based on
user selected mode
play music/I am sad Plays music from folders
movie<movie_name> Opens media player and starts the specific
name
take screenshot<file_name> Takes screen shot and saves as user
provided file name in specified folder
20. 15
Voice Command Task Description
change walpaper Changes the desktop background
take me to my chilhood Shows any chilhood photo of user if there
is any specified folder conatining all such
photos
set alarm
<hours><minutes><am/pm>
Sets alarm and rings alarm tone when the
set time reaches
set timer <seconds> Set a timer for user provided seconds and
rings a warning tone when the deadline
reacches
read breaking news Read top 10 global news headlines aloud
in a day
india corona cases Shows results of top 5 states in india
corona update Shows total global records of corona, like
number of infected , death , recovered
people
take photo Captures picture and saves to specified
folder
record video Records video using dektop camera and
saves to specified folder
record audio Records audio using microphone and
saves the recorded file in specified folder
help me <problem_statement> Sends whatsapp messages to emergency
contact through voice command
police<problem_statement> Sends mail to local police station mail-id
mentioning the problem through voice
command
restart my pc Takes confirmation from user about
restarting and works accordingly
shutdown my pc Takes confirmation from user about
switching the system off and works
accordingly
exit Quits the application
21. 16
9. Uses of Speech Recognition program
Basically speech recognition is used for two main purposes. First and foremost
dictation that is in the context of speech recognition is translation of spoken words
into text and second controlling the computer and its various application by voice .
Writing by voice let a person to write 150 words per minute or more if indeed he/she
can spoke quickly. This perspective of speech recoginition programs help to do much
bigger things in a short time and this way they can save their effort too.
10. Applications
10.1 From medical Perspective :
People with disabilities can benefit from speech recognition programs. Speech
recognition is especially useful for people who have difficulties using their hands,
in such cases speech recognition is much beneficial and they can use for operating
computers. Speech recognition is used in deaf telephony such as voicemail to
text.
10.2 From military perspective :
Speech recognition is important from military perspective ; in air force speech
recognition has definite potential for reducing the pilot workload. Beside the air
force such program can also be used to train helicopters , battle management and
other applications.
10.3 From education perspective :
Individual with learning disabilities who have problems with thought-to-paper
communication can benefit from the software . some other application areas of speech
recognition technology are described above.
22. 17
11. Some factors that may disturb functionalities of the application :
Homonyms : Are the words that are differently spelled and have the different
meaning but acuqires the same meaning, for example ‘to’ and ‘two’, ‘be’ and
‘bee’. This is a challenge for computer machine to distinguish between such
types of phrases that sound alike.
Overlapping Speeches : A second challenge in this process is to understand the
speech uttered by user, often the machine takes wrong command on the basis of
the style of uttering a word .
12. The future of Speech Recognition :
Accuracy will become better and better.
Dictation speeech recognition will gradually become accepted
Using speech recognition in collaboration with AI a system can be developed
exactly as intelligent as human
In future probably corporate tasks can be automated using speech recognition
and selenium.
13. References :
1. https://pypi.org/
2. https://www.geeksforgeeks.org/
3. https://github.com/github
4. https://www.kdnuggets.com/2020/06/easy-speech-text-python.html
5.
https://www.analyticsvidhya.com/blog/2019/07/learn-build-first-speech-to-text-model-
python/