A simple natural language interface application for launching applications and showing user information based on voice input processed by using natural language programming concepts
Gen AI in Business - Global Trends Report 2024.pdf
Jeeves -natural language interface application
1. JEEVES - A NATURAL LANGUAGE
PROCESSING APPLICATION FOR
ANDROID
Presented by,
Anshul Agarwal (IP108IS017)
Karan Harsh Wardhan(1PI08IS045)
Pavani Deepak Mehta(1PI08IS070)
Jan 2012 - May 2012 Dept. of ISE 1
2. AGENDA
• Project Overview
• Relevance
• Requirements
• Introduction
• Technologies Used
• System Design
Jan 2012 - May 2012 Dept. of ISE 2
3. AGENDA
• Software Development Strategy
• Implementation
• Difficulties
• Screenshots
• Conclusion
• Future Enhancements
• References
Jan 2012 - May 2012 Dept. of ISE 3
4. PROJECT OVERVIEW
• Goal of the Project:
– To design a Natural Language Processing (NLP)
Interface Application for Android Platform
• Scope of the Project
– Input from user using Voice Recognition
– Application should process input and recognize user
commands spoken in natural English
– Basic features should be to make calls and send
messages to contacts
– Advanced features such as showing weather, Google
search and launching apps
Jan 2012 - May 2012 Dept. of ISE 4
5. RELEVANCE
• Allows for more intuitive human-computer
interaction
• More convenient if computer can automate
tasks usually performed by humans
• Goal is to reduce keypad usage as much as
possible, and allow user to speak naturally
• Can help the disabled to easily use phones
• Acts as a virtual assistant, enhancing
productivity
Jan 2012 - May 2012 Dept. of ISE 5
7. INTRODUCTION
• NLP is a field of computer science and
linguistics concerned with the interactions
between computers and human languages
• Aim is to design software that will analyze,
understand, and generate languages that
humans use naturally
• Eventually user will be able to address the
computer as they talk to another person
Jan 2012 - May 2012 Dept. of ISE 7
8. TECHNOLOGIES USED - ANDROID
WHY ANDROID?
• Android is an open-source linux-based
operating system for mobile devices
• Requires developer license only to publish,
not develop
• Google voice recognition inbuilt in most
Android devices
• Layered architecture facilitates rapid
development of applications
Jan 2012 - May 2012 Dept. of ISE 8
9. TECHNOLOGIES USED - ANDROID
• ANDROID SYSTEM ARCHITECTURE
Jan 2012 - May 2012 Dept. of ISE 9
10. TECHNOLOGIES USED - ANDROID
INTENTS
• Messaging facility for late run-time binding
between components
• Intent object is a passive data structure
holding description of operation to be
performed
• Three core components of an application —
activities, services, and broadcast receivers —
are activated through intents
Jan 2012 - May 2012 Dept. of ISE 10
11. TECHNOLOGIES USED - ANDROID
INTENTS
• Two types - Implicit And Explicit
• Intent filters are implemented help with intent
resolution
• Used to turn apps into high-level libraries and
make code modular and reusable
Intent intent = new Intent (App Package name);
startActivity(intent);
• Various additional info can be added
intent.putExtra(“title”,”Hello codeandroid”);
Jan 2012 - May 2012 Dept. of ISE 11
12. TECHNOLOGIES USED – HTTP POST
• POST request is used to send data to server
• The string detected by voice recognizer is
passed to server using this method
• Accomplished using in-built HttpCore API i.e
org.apache.http package
• The server performs processing and returns a
JSON response
Jan 2012 - May 2012 Dept. of ISE 12
13. TECHNOLOGIES USED - JSON
• JavaScript Object Notation (JSON) is a
lightweight data-interchange format
• Based on a subset of the JavaScript
Programming Language
• Is completely language independent
• In java, org.json.JSONObject is used to parse
these strings
JSONObject responseJSON = new JSONObject(responseString);
String workId = responseJSON.getString("id");
Jan 2012 - May 2012 Dept. of ISE 13
14. TECHNOLOGIES USED - JSON
• Example:
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"} ] } }}
Jan 2012 - May 2012 Dept. of ISE 14
15. TECHNOLOGIES USED –
VOICE RECOGNITION
WHY GOOGLE VOICE RECOGNITION?
• Focus of project was not automatic speech recognition
• Pre-installed on most android phones, easy to access
• Requires no special permission/payment to be used
• Developed, optimized and maintained by Google since
2007
• Occurs off-site i.e. on Google’s servers so no “weighty”
voice recognition s/w needs to be installed on phone
• Only need “android.speech.RecognizerIntent” package
Jan 2012 - May 2012 Dept. of ISE 15
16. TECHNOLOGIES USED –
VOICE RECOGNITION
HOW DOES RECOGNITION WORK?
• Google uses artificial intelligence algorithms to
recognize spoken sentences
• Stores voice data anonymously for analysis
purposes
• Cross matches spoken data with written queries
on server
• Key problems of computational power, data
availability and managing large amounts of
information are handled with ease
Jan 2012 - May 2012 Dept. of ISE 16
17. TECHNOLOGIES USED - NLTK
NATURAL LANGUAGE TOOLKIT
• Open-source suite of libraries for NLP for the
Python language
• Includes graphical demonstrations and sample
data
• Provides NLP API’s, such as for importing a
corpus, loading grammar from a file, etc.
Jan 2012 - May 2012 Dept. of ISE 17
21. SOFTWARE DEVELOPMENT STRATEGY
• Software Development Strategy used is
Extreme Programming
• It is a type of agile software development
• Advocates frequent releases in multiple short
development cycles rather than one long cycle
• Involves programming in groups and doing
extensive code review
• Works best with smaller groups
Jan 2012 - May 2012 Dept. of ISE 21
23. IMPLEMENTATION
WHAT WE HAVE IMPLEMENTED
• Corpus – modification of Brown
• Tokenizer
• Part-of-Speech tagger
• Grammar
• Syntactic and Semantic Analysis
• Client application on Android
– Takes voice input from user, converts to text, passes
text to NLP server, receives id from server and
launches corresponding intent
Jan 2012 - May 2012 Dept. of ISE 23
24. IMPLEMENTATION
• Client application starts up and prompts user
to input using Google Voice Recognition
• Input data is sent to Google servers for
processing and text is returned to client
• Input text is now passed to the NLP server for
processing using HTTP POST
• Server performs Natural Language Processing
Jan 2012 - May 2012 Dept. of ISE 24
25. IMPLEMENTATION
• Steps involved in NLP:
– Lexical Analysis: converts sequence of characters into
a sequence of tokens
– Morphological Analysis: identification, analysis and
description of the structure of a given
language's linguistic units
– Syntactic Analysis: analyzing text, made up of a
sequence of tokens, to determine its grammatical
structure
– Semantic Analysis: relating syntactic structures from
the levels of phrases and sentences to their language-
independent meanings
Jan 2012 - May 2012 Dept. of ISE 25
26. IMPLEMENTATION
• A corpus is a large and structured set of texts
• Used in part-of-speech tagging to tag words as
parts of a sentence
• Tags stored along with the words in the corpus
• We have modified the Brown corpus to include
more relevant phrases and commands
• Contains data from books, news articles, journals,
etc.
• TaggedCorpusReader needed to import the
corpus
Jan 2012 - May 2012 Dept. of ISE 26
27. IMPLEMENTATION
• During lexical analysis, the string is split up into
various tokens, the separator being space
• Tokens, which are words in this case, are passed
to a part-of-speech(POS) tagger
• POS tagger assigns a tag to each word depending
on what part of speech it is
• Eg of tags – adjective(ADJ), common noun(NN),
proper noun(NP), verb(VB) etc.
• Custom tag for command(CMD) created to
recognize commands for application, such as call,
message, launch, etc.
Jan 2012 - May 2012 Dept. of ISE 27
28. IMPLEMENTATION
• Different POS taggers available
• Simplest is Default Tagger
– Tagging Accuracy is 13%
• Unigram Tagger
– Tagging Accuracy is 81%
• Bigram Tagger
– Used alone, accuracy is 10%, but when used with
Unigram Tagger as a backoff, accuracy is 85%
• Trigram Tagger
– Accuracy when used with the prev Bigram Tagger as
backoff is 91.3%
Jan 2012 - May 2012 Dept. of ISE 28
29. IMPLEMENTATION
• As it gives the highest accuracy, a model for
tagging is used as follows:
– First, a Default Tagger is used, which assigns a
default noun tag
– Then, a Unigram Tagger is used, which uses
Default Tagger as backoff
– Then, a Bigram Tagger is used, which uses the
above Unigram Tagger as backoff
– Lastly, a Trigram Tagger is used, which uses the
above Bigram Tagger as backoff
Jan 2012 - May 2012 Dept. of ISE 29
30. IMPLEMENTATION
• Taggers need to be trained with the corpus so
that they can recognize words and tag them
accordingly
• Training needs time
• Taggers can be pre-trained with the data to
save time
• PICKLE files in python are used to save such
pre-trained taggers
Jan 2012 - May 2012 Dept. of ISE 30
31. IMPLEMENTATION
• Tagger uses technique of statistics and
probability to assign tags
• Tagged tokens passed to parser
• Parser makes sure sentence conforms to rules
of grammar, hence only grammatically valid
sentences are accepted
• Predefined commands specify the
functionality they represent
Jan 2012 - May 2012 Dept. of ISE 31
32. IMPLEMENTATION
• Word, tag tuples are parsed to recognize
commands and entities to which those
commands apply
• Command words and receivers are extracted
• A unique id is returned to the client along with
the receiver
• The id represents the functionality required, and
the receiver indicates the variable parameter to
which the functionality is applied
• This id is used to launch intents which also take
into account the parameters
Jan 2012 - May 2012 Dept. of ISE 32
33. DIFFICULTIES
• Modification of corpus to suit the application
needs
• Choosing a POS tagger
• Advanced semantics – multiple meanings of
the same sentence. Eg., call, make a call, make
a phone call, message, send a message, etc.
• Separation of training of data and processing
Jan 2012 - May 2012 Dept. of ISE 33
39. CONCLUSION
• Aim is to create a Natural Language Interface
application which acts as a virtual assistant
• App is able to perform basic functions such as
calling and messaging
• Also performs advanced functions like search,
showing the weather and launching apps
• Works for multiple semantics for the same
command, spoken in natural English
• Does not require much time, only a few seconds
Jan 2012 - May 2012 Dept. of ISE 39
40. FUTURE ENHANCEMENTS
• Include more semantics for the same
command
• Increasing accuracy for longer sentences
• Processing on mobile device for short basic
commands such as ‘call smith’
• Providing custom settings to user
Jan 2012 - May 2012 Dept. of ISE 40
41. REFERENCES
[1] Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with
Python. United States of America: O’Reilly Media, Inc. June 2009
[2] Edward Loper. “NLTK: Building a Pedagogical Toolkit in Python”, Department of
Computer and Information Science, University of Pennsylvania, Philadelphia, PA
19104-6389, USA
*3+ Cheng Juan. “Research and Implementation English Morphological Analysis and
Part-of-Speech Tagging”, Normal Education Department, Bohai Shipbuilding Vocational
College, Huludao,China
[4] W. Wang, J. Auer, R. Parasuraman, I. Zubarev, D. Brandyberry and M. P. Harper. “A
Question Answering System Developed as a Project in a Natural Language Processing
Course”, Purdue University, West Lafayette IN.
[5] Ivan Archeurov, “Architecture of an NLP engine” Internet:
https://sites.google.com/site/iakcheurov/my-articles/architecture-of-nlp-engine
*6+ World Weather Online, “How Free Local Weather API Works”, Internet:
http://www.worldweatheronline.com/weather-api.aspx
[7] Natural Language Toolkit, Internet: http://www.nltk.org/
[8] Andrew Montalenti. “Just Enough NLP With Python”, Internet:
http://pixelmonkey.org/pub/nlp-training/
Jan 2012 - May 2012 Dept. of ISE 41
http://json.org/Json is based on subset of javascript defined in:http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html//to demo complexity of idea and why abandonedhttp://developer.android.com/resources/articles/speech-input.html
http://www.slate.com/articles/technology/technology/2011/04/now_youre_talking.single.htmlCohen says that building just one part of the speech-recognition system required "roughly 70 CPU-years" of computer time.whichgoogle did in a day!http://googleblog.blogspot.in/2010/10/goodbye-to-old-friend-1-800-goog-411.html note the date when it was started
Brown – nearly 1million words
Eg. Parameter may specify contact to be called or messaged or app to be launched