SlideShare a Scribd company logo
1 of 1
Download to read offline
Implementing Task-Oriented Dialogues on Turtlebot 2
Mahima Ghale, Caitlin Coggins, Rebecca Kim, Raeesa Mehjabeen
Interactive Computing Research Lab
Mount Holyoke College, Department of Computer Science
Professor Heather Pon-Barry
turtlebot edit.jpg
Text to speech(TTS) is a speech
synthesizer that converts text input into
speech output. Google TTS was used
because its voice output flows smoothly and
sounds most human-like out of all that were
tried during this summer research.
Future works for this research involve improving speech recognition by using acoustic modeling in Pocketsphinx, switching to Kaldi,
and/or improving input audio quality by either placing the Kinect on top of the Turtlebot. Dialogues can be made more natural by finding
ways to signal (using LEDs, beep sound etc.) the user when the turtlebot is ready to listen, and using a mixed-initiative interaction and
varying patterns in the dialogue. Localization and navigation will need to be refined by customizing the SLAM algorithm so that the
Turtlebot can recover from sudden obstacles quickly and efficiently.
Future	Works
Text	To	Speech	(TTS)
Kinect
Figure	4.	The	process	of	running	Google	TTS	on	the	
Turtlebot
Acknowledgements
We would like to thank Professor Heather Pon-Barry for providing us with the
opportunity to work on this project, the Clare Boothe Luce Fund and Mount Holyoke
LYNK Fund for providing necessary funding, and the Computer Science Department
for constant help and support. We would also like to thank Joydeep and his team in
AMRL at University of Massachusetts for helping us set up the Turtlebot.
Navigation,	Mapping,	and	Localization
For Navi to be able to go to specific rooms, it must create
a map (mapping), be able to read the map, keep track of
its position in the map (localization) and calculate a path
to the desired destination (navigation). For this purpose,
we used a ROS package called turtlebot_navigation,
which implements the SLAM (Simultaneous Localization
and Mapping) algorithm.
The Kinect’s 3D Sensors detect walls and everything it
considers to be an obstacle, which are then saved as a
map. During the research, several places inside the lab
were marked with room numbers for convenience. When
given a map of the environment and the Navi’s initial
position, the turtlebot_navigation package calculates a
path to reach its destination.
particular word or a group of words in a
phrase or a sentence, enables Navi to
understand the user as long as a keyword
is found in the user’s speech or utterance.
This allows the user to answer Navi’s
questions freely, without having to follow a
dialogue script. The conversation was
converted into a Turtlebot-readable format
using GraphML, an XML representation of
a graph containing nodes and edges.
Dialogues
Figure 2. A part of GraphML from the Turtlebot’s Dialogue.The yellow
boxes above are nodes(Turtlebot’s speech) and the thin arrows with text
labels are edges(keywords from user’s speech).
ASR TTS TaskGraph
Figure	5.	Kinect, with	
its	labeled	parts,	used	
for	ASR,	as	well	as	for	
mapping	and	
navigation
Turtlebot 2 is a service robot that
should be able to perform tasks
for its users. The goal for this
summer was to enable it to deliver
items or guide a
Abstract
visitor to a
room. To make
this possible,
the main focus
was on
behavior and
speech
recognition,
which allows
users to ask
the TurtleBot
for help rather
than typing in
instructions on
a computer.
Figure 1. Turtlebot 2,
named Navi, in
Interactive
Computing Research
Lab (ICRL)
Figure 5. The map of Interactive Computing Research
Lab (ICRL) created by driving Navi around through
teleoperation, using keyboards to control where it
moves.
Automatic	Speech	Recognition	(ASR)
Pocketsphinx is an open-source, speaker-independent, and continuous
speech recognition engine. Although more challenging to install and use,
Pocketsphinx has a much better recognition quality than that of Rospeex.
Users can fine-tune Pocketsphinx by creating a new dictionary, which
contains a list of pronunciations of words that the TurtleBot can recognize.
Grammar also makes it easier to figure out which words from the dictionary
were spoken.
Google ASR is a closed-source, online ASR system that converts audio to
text. It returns several texts that may correspond to the audio input, and
also provides a confidence level.
Automatic Speech Recognition(ASR) is a process by which a computer translates a person’s speech into text. Several different ASR
engines were used, including Rospeex, Pocketsphinx, and Google ASR.
Rospeex is a Robot Operating System (ROS) package. While simple to install and use, Rospeex provided the worst recognition of all
ASR systems tested. The package is closed sourced, so there is no way to improve the recognition.
Figure	3.	Several	scripts	are	need	to	run	Pocketsphinx.
Kinect is a Microsoft
sensor add-on for the
Xbox Gaming console. It
consists of a microphone
array, 3D Depth Sensors,
and an RGB camera.
A task-oriented dialogue (conversation) was developed based on the information required by Navi in
order to perform a task. In order to make the conversation as unconstrained (meaning that users don’t have
to follow an exact script to converse with Navi) as possible, word spotting and regular expression (regex)
were adopted. Regex, which can find a

More Related Content

Viewers also liked

Hardware Power Modeling for Turtlebot
Hardware Power Modeling for TurtlebotHardware Power Modeling for Turtlebot
Hardware Power Modeling for TurtlebotIvan Ruchkin
 
ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告
ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告
ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告Kensei Demura
 
Hacking Robotics(English Version)
Hacking Robotics(English Version)Hacking Robotics(English Version)
Hacking Robotics(English Version)Kensei Demura
 
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRVROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRVJuxi Leitner
 
ROS Tutorial 02 - CIT
ROS Tutorial 02 - CITROS Tutorial 02 - CIT
ROS Tutorial 02 - CITDaiki Maekawa
 

Viewers also liked (6)

Hardware Power Modeling for Turtlebot
Hardware Power Modeling for TurtlebotHardware Power Modeling for Turtlebot
Hardware Power Modeling for Turtlebot
 
ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告
ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告
ロボカップ世界大会報告@IPAセキュリティキャンプ全国大会2016チューター成果報告
 
Mapping mobile robotics
Mapping mobile roboticsMapping mobile robotics
Mapping mobile robotics
 
Hacking Robotics(English Version)
Hacking Robotics(English Version)Hacking Robotics(English Version)
Hacking Robotics(English Version)
 
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRVROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
 
ROS Tutorial 02 - CIT
ROS Tutorial 02 - CITROS Tutorial 02 - CIT
ROS Tutorial 02 - CIT
 

Similar to Turtlebot Poster_Summer 2016

Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture noteSusang Kim
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
 
5.smart multilingual sign boards
5.smart multilingual sign boards5.smart multilingual sign boards
5.smart multilingual sign boardsEditorJST
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportAkshit Arora
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingiosrjce
 
VOICE BASED E-MAIL
VOICE BASED E-MAILVOICE BASED E-MAIL
VOICE BASED E-MAILStudentRocks
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET Journal
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsHimanshu kandwal
 
Voice based web browser
Voice based web browserVoice based web browser
Voice based web browserSowndaryaP
 
Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009Svetlin Nakov
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 

Similar to Turtlebot Poster_Summer 2016 (20)

Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Chatbot_Presentation
Chatbot_PresentationChatbot_Presentation
Chatbot_Presentation
 
5.smart multilingual sign boards
5.smart multilingual sign boards5.smart multilingual sign boards
5.smart multilingual sign boards
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
 
H010625862
H010625862H010625862
H010625862
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
OCR using Tesseract
OCR using TesseractOCR using Tesseract
OCR using Tesseract
 
OCR using Tesseract
OCR using TesseractOCR using Tesseract
OCR using Tesseract
 
VOICE BASED E-MAIL
VOICE BASED E-MAILVOICE BASED E-MAIL
VOICE BASED E-MAIL
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
Voice based web browser
Voice based web browserVoice based web browser
Voice based web browser
 
Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Opentracing 101
Opentracing 101Opentracing 101
Opentracing 101
 

Turtlebot Poster_Summer 2016

  • 1. Implementing Task-Oriented Dialogues on Turtlebot 2 Mahima Ghale, Caitlin Coggins, Rebecca Kim, Raeesa Mehjabeen Interactive Computing Research Lab Mount Holyoke College, Department of Computer Science Professor Heather Pon-Barry turtlebot edit.jpg Text to speech(TTS) is a speech synthesizer that converts text input into speech output. Google TTS was used because its voice output flows smoothly and sounds most human-like out of all that were tried during this summer research. Future works for this research involve improving speech recognition by using acoustic modeling in Pocketsphinx, switching to Kaldi, and/or improving input audio quality by either placing the Kinect on top of the Turtlebot. Dialogues can be made more natural by finding ways to signal (using LEDs, beep sound etc.) the user when the turtlebot is ready to listen, and using a mixed-initiative interaction and varying patterns in the dialogue. Localization and navigation will need to be refined by customizing the SLAM algorithm so that the Turtlebot can recover from sudden obstacles quickly and efficiently. Future Works Text To Speech (TTS) Kinect Figure 4. The process of running Google TTS on the Turtlebot Acknowledgements We would like to thank Professor Heather Pon-Barry for providing us with the opportunity to work on this project, the Clare Boothe Luce Fund and Mount Holyoke LYNK Fund for providing necessary funding, and the Computer Science Department for constant help and support. We would also like to thank Joydeep and his team in AMRL at University of Massachusetts for helping us set up the Turtlebot. Navigation, Mapping, and Localization For Navi to be able to go to specific rooms, it must create a map (mapping), be able to read the map, keep track of its position in the map (localization) and calculate a path to the desired destination (navigation). For this purpose, we used a ROS package called turtlebot_navigation, which implements the SLAM (Simultaneous Localization and Mapping) algorithm. The Kinect’s 3D Sensors detect walls and everything it considers to be an obstacle, which are then saved as a map. During the research, several places inside the lab were marked with room numbers for convenience. When given a map of the environment and the Navi’s initial position, the turtlebot_navigation package calculates a path to reach its destination. particular word or a group of words in a phrase or a sentence, enables Navi to understand the user as long as a keyword is found in the user’s speech or utterance. This allows the user to answer Navi’s questions freely, without having to follow a dialogue script. The conversation was converted into a Turtlebot-readable format using GraphML, an XML representation of a graph containing nodes and edges. Dialogues Figure 2. A part of GraphML from the Turtlebot’s Dialogue.The yellow boxes above are nodes(Turtlebot’s speech) and the thin arrows with text labels are edges(keywords from user’s speech). ASR TTS TaskGraph Figure 5. Kinect, with its labeled parts, used for ASR, as well as for mapping and navigation Turtlebot 2 is a service robot that should be able to perform tasks for its users. The goal for this summer was to enable it to deliver items or guide a Abstract visitor to a room. To make this possible, the main focus was on behavior and speech recognition, which allows users to ask the TurtleBot for help rather than typing in instructions on a computer. Figure 1. Turtlebot 2, named Navi, in Interactive Computing Research Lab (ICRL) Figure 5. The map of Interactive Computing Research Lab (ICRL) created by driving Navi around through teleoperation, using keyboards to control where it moves. Automatic Speech Recognition (ASR) Pocketsphinx is an open-source, speaker-independent, and continuous speech recognition engine. Although more challenging to install and use, Pocketsphinx has a much better recognition quality than that of Rospeex. Users can fine-tune Pocketsphinx by creating a new dictionary, which contains a list of pronunciations of words that the TurtleBot can recognize. Grammar also makes it easier to figure out which words from the dictionary were spoken. Google ASR is a closed-source, online ASR system that converts audio to text. It returns several texts that may correspond to the audio input, and also provides a confidence level. Automatic Speech Recognition(ASR) is a process by which a computer translates a person’s speech into text. Several different ASR engines were used, including Rospeex, Pocketsphinx, and Google ASR. Rospeex is a Robot Operating System (ROS) package. While simple to install and use, Rospeex provided the worst recognition of all ASR systems tested. The package is closed sourced, so there is no way to improve the recognition. Figure 3. Several scripts are need to run Pocketsphinx. Kinect is a Microsoft sensor add-on for the Xbox Gaming console. It consists of a microphone array, 3D Depth Sensors, and an RGB camera. A task-oriented dialogue (conversation) was developed based on the information required by Navi in order to perform a task. In order to make the conversation as unconstrained (meaning that users don’t have to follow an exact script to converse with Navi) as possible, word spotting and regular expression (regex) were adopted. Regex, which can find a