2. Contents
Introduction
How does an intelligent personal assistant work?
Existing system and proposed system
Dataflow diagrams
Automatic Speech Recognition System Model in Google AP
Use case diagrams
Modules
screenshots
Requirements
Conclusion
2
3. Introduction
A virtual assistant is a technology based on artificial intelligence. The
software uses a device’s microphone to receive voice requests while the
voice output takes place at the speaker. But the most exciting thing
happens between these two actions.
It is a combination of several different technologies: voice recognition,
voice analysis and language processing.
It is completely developed using one of the most powerful language
python.
3
4. How does an intelligent personal assistant
work?
User asks a personal assistant to perform a task.
The natural language audio signal is converted into digital data that can
be analyzed by the software.
Compared with a database of the software using an innovative algorithm
to find a suitable answer.
This database is located on distributed servers in cloud networks. For this
reason ,it must have a reliable Internet connection.
4
5. EXISTING SYSTEM V/S PROPOSED SYSTEM 5
Existing System Proposed System
Usage statistics and user data are sent to
the developer
Does not collect any user data
Installation required Installation required
User cannot edit or change the modules User can edit and add new modules
Not a free Software Free Software
lightweight
Simple User interface
7. User Voice to text
Action perform
Database
Personal voice
assistant
Computer
Voice Command Perform action
DFD - 1
7
8. User This will convert
voice into binary
Microphone
Computer
This will convert
voice data into text
form
Google voice API
Flow sensor
value
Voice audio
data
Perform action
DFD - 2
8
9. Automatic Speech Recognition System Model in
Google API
9
Feature Extraction Decoder
Acoustic
models
Pronunciation
Dictionary
Language
Models
Speech signal Recognized words
10. Feature Extraction
Feature Extraction is a common to extract a set of features from speech
signal.
Classification of Feature extraction is carried out on the set of features
instead of the speech signals themselves.
The feature extraction stage seeks to provide a compact representation of
the speech waveform. This form should minimize the loss of information
that discriminates between words, and provide a good match with the
distributional assumptions made by the acoustic models.
10
11. ACOUSTIC MODELS
• Acoustic model is a relationship between audio signal and phoneme
• Phoneme means one of the smallest unit of speech that make one word different from another word
PRONUNCIATION DICTIONARY
• The act or result of producing the sounds of speech, including articulation, stress, and intonation
• A phonetic transcription of a given word, sound, etc.
• An accepted standard of the sound and stress patterns of a word, phrase, etc.
LANGUAGE MODELS
• The language model provides context to distinguish between words and phrases that sound similar.
for example, In American English the phrases “recognize speach” and “wreck a nice beach” sound
similar , but mean different things.
11
12. Use case diagrams 12
Input
voice
Sent
mail
Turn
on/off
Wi Fi
Wikipedia
Read
search
User
13. User Microphone Google API
Computer
Open Personal Assistant Accessing G-API
Voice response
Start Mic
Wait until user speak
Receive data
Convert audio to text
Match text with action
Perform action
Voice / Text
Response
13
14. MODULES
Speech recognition
Process and system utilities ( psutil )
PlaySound
SMTP Protocol client ( smtplib )
Google Text To Speech ( gtts )
14
15. Requirements
Software requirements
Pycharm IDE/visual studio code
Inno Setup Compiler
Pyinstaller
Python 3.8.2 and its Sub modules
Hardware requirements
Intel core i3
4gb RAM
30 Gb Hard drive space
15
16. Conclusion
Voice Controlled Personal Assistant System will use the Natural language
processing and can be integrated with artificial intelligence techniques to
achieve a smart assistant that can control the computer and applications and
even solve user queries using web searches.. It can be designed to minimize
the human efforts to interact with many other subsystems, which would
otherwise have to be performed manually. By achieving this, the system will
make human life comfortable
16