SlideShare a Scribd company logo
1 of 48
Download to read offline
A Multimodal Ouija Board for Aircraft Carrier Deck Operations 
 
by 
Birkan Uzun 
S.B., C.S. M.I.T., 2015 
 
 
Submitted to the  
Department of Electrical Engineering and Computer Science 
in Partial Fulfillment of the Requirements for the Degree of  
Master of Engineering in Computer Science and Engineering 
at the 
Massachusetts Institute of Technology 
June 2016 
Copyright 2016 Birkan Uzun. All rights reserved. 
 
The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and 
electronic copies of this thesis document in whole and in part in any medium now known or 
hereafter created.  
 
 
Author ……………………………………………………………………………………………... 
Department of Electrical Engineering and Computer Science 
April 6, 2016 
 
Certified by ………………………………………………………………………………………...  
Randall Davis, Professor 
Thesis Supervisor 
 
Accepted by ………………………………………………………………………………………..  
Dr. Christopher J. Terman 
Chairman, Masters of Engineering Thesis Committee 
 
 
 
 
1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2 
A Multimodal Ouija Board for Aircraft Carrier Deck Operations 
 
by 
Birkan Uzun 
 
Submitted to the  
Department of Electrical Engineering and Computer Science 
April 6, 2016 
in Partial Fulfillment of the Requirements for the Degree of  
Master of Engineering in Computer Science and Engineering 
 
 
Abstract 
In this thesis, we present improvements to DeckAssistant, a system that provides a traditional                           
Ouija board interface by displaying a digital rendering of an aircraft carrier deck that assists deck                               
handlers in planning deck operations. DeckAssistant has a large digital tabletop display that                         
shows the status of the deck and has an understanding of certain deck actions for scenario                               
planning. To preserve the conventional way of interacting with the old­school Ouija board where                           
deck handlers move aircraft by hand, the system takes advantage of multiple modes of                           
interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the                           
system. The system responds with its own speech and gestures, and it updates the display to                               
show the consequences of the actions taken by the handlers. The system can also be used to                                 
simulate certain scenarios during the planning process. The multimodal interaction described                     
here creates a communication of sorts between deck handlers and the system. Our contributions                           
include improvements in hand­tracking, speech synthesis and speech recognition. 
 
 
3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4 
Acknowledgements 
Foremost, I would like to thank my advisor, Professor Randall Davis, for the support of                             
my work, for his patience, motivation and knowledge. His door was always open whenever I had                               
a question about my research. He consistently allowed this research to be my own work, but                               
steered me in the right direction with his meaningful insights whenever he thought I needed it.  
I would also like to thank Jake Barnwell for helping with the development environment                           
setup and documentation.  
Finally, I must express my gratitude to my parents and friends who supported me                           
throughout my years of study. This accomplishment would never be possible without them. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6 
Contents 
1. Introduction……………………………………………………………………………..13 
1.1. Overview…………………………………………………………………………13 
1.2. Background and Motivation……………………………….…..….………..….....14 
1.2.1. Ouija Board History and Use…………………………………………….14 
1.2.2. Naval Push for Digital Information on Decks………………………....…15 
1.2.3. A Multimodal Ouija Board………………………………………………16 
1.3. System Demonstration………………………………………………………...…17 
1.4. Thesis Outline……………………………………………………………………20 
2. Deck Assistant Functionality…………………………………………………………..21 
2.1. Actions in DeckAssistant……...………………………………………………....21 
2.2. Deck Environment………...……………………………………………………..22 
2.2.1. Deck and Space Understanding…....…………………………………….22 
2.2.2. Aircraft and Destination Selection…………..…………………………...23 
2.2.3. Path Calculation and Rerouting.…………………………………………23 
2.3. Multimodal Interaction..…………………………………………………………24 
2.3.1. Input………...……………………………………………………………24 
2.3.2. Output………...………………………………………………………….24 
3. System Implementation…….…………………………………………………………..28 
3.1. Hardware………………….…...…………………………………………………28 
3.2. Software……………….......……………………………………………………..29 
3.2.1. Libraries……………………....…....…………………………………….29 
7 
3.2.2. Architecture……………………...…………..…………………………...30 
4. Hand Tracking...……………………….…….…..………………………......…..……..32 
4.1. The Leap Motion Sensor…....……………………………………………………33 
4.1.1. Pointing Detection………....……………………………....…………….34 
4.1.2. Gesture Detection…………………………....…………………………...35 
5. Speech Synthesis and Recognition……………………………………………………..37 
5.1. Speech Synthesis……….……...…………………………………………………37 
5.2. Speech Recognition…..…...…………………………………………………..…38 
5.2.1. Recording Sound………………......……………………………………..38 
5.2.2. Choosing a Speech Recognition Library..…..…………………………...39 
5.2.3. Parsing Speech Commands…....…………………………………………40 
5.2.4. Speech Recognition Stack in Action……………………………………..41 
6. Related Work…………………….……………………………………………………..44 
6.1. Navy ADMACS.……….……...…………………………………………………44 
6.2. Deck Heuristic Action Planner……....…………………………………………..44 
7. Conclusion….…………………….……………………………………………………..45 
7.1. Future Work…...……….……...…………………………………………………46 
8. References…..…………………….……………………………………………………..47 
9. Appendix…....…………………….……………………………………………………..48 
9.1. Code and Documentation....…...…………………………………………………48 
 
 
 
8 
List of Figures  
Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images..15 
Figure 2: The ADMACS Ouija board. Source: Google Images…………………………………16 
Figure 3:  DeckAssistant’s tabletop display with the digital rendering of the deck [1]...........…..17 
Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1]....18  
Figure 5: The initial arrangement of the deck [1]..........................................................................19 
Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].......19 
Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is                               
blocked [1].....................................................................................................................................19 
Figure 8: DeckAssistant displays an alternate location for the F­18 that is blocking the path                             
[1]...............................................………………………………………………………………....20 
Figure 9:  The logic for moving aircraft [1]...................................................................................22 
Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images....................................23 
Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered                               
over is highlighted green [1]..........................................................................................................25 
Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1]......................................25 
Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].....................26 
Figure 14: Alternate region to move the C­2 is highlighted in blue [1]........................................27 
Figure 15: The hardware used in DeckAssistant………………………………………………...28 
Figure 16: DeckAssistant software architecture overview……………………………………....31 
Figure 17: The Leap Motion Sensor mounted on the edge of the table top display……………..33 
9 
Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer                       
Portal……………………………………………………………………………………………..35 
Figure 19: Demonstration of multiple aircraft selection with the pinch gesture………………...36 
Figure 20: A summary of how the speech recognition stack works……………………………..43 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10 
List of Tables  
Table 1: Set of commands that are recognized by DeckAssistant……………………………….41 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11 
List of Algorithms  
Algorithm 1: Summary of the pointing detection process in pseudocode…………………….…35 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12 
1. Introduction 
1.1. Overview 
In this thesis, we present improvements to DeckAssistant, a digital aircraft carrier Ouija                         
Board interface that aids deck handlers with planning deck operations. DeckAssistant supports                       
multiple modes of interaction, aiming to improve the user experience over the traditional Ouija                           
Boards. Using hand­tracking, gesture recognition and speech recognition, it allows deck handlers                       
to plan deck operations by pointing at aircraft, gesturing and talking to the system. It responds                               
with its own speech using speech synthesis and updates the display, which is a digital rendering                               
of the aircraft carrier deck, to show results when deck handlers take action. The multimodal                             
interaction described here creates a communication of sorts between deck handlers and the                         
system. DeckAssistant has an understanding of deck objects and operations, and can be used to                             
simulate certain scenarios during the planning process. 
The initial work on DeckAssistant was done by Kojo Acquah, and we build upon his                             
implementation [1]. Our work makes the following contributions to the fields of                       
Human­Computer Interaction and Intelligent User Interfaces: 
● It discusses how using the Leap Motion Sensor is an improvement over the Microsoft                           
Kinect in terms of hand­tracking, pointing and gesture recognition. 
● It presents a speech synthesis API which generates speech that has high pronunciation                         
quality and clarity. It investigates several speech recognition APIs, argues which one is                         
the most applicable, and introduces a way of enabling voice­activated speech recognition. 
13 
● Thanks to the refinements in hand­tracking and speech, it provides a natural, multimodal                         
way of interaction with the first large­scale Ouija Board alternative that has been built to                             
help with planning deck operations. 
1.2. Background and Motivation 
1.2.1. Ouija Board History and Use 
The flight deck of an aircraft carrier is a complex scene, riddled with incoming aircraft,                             
personnel moving around to take care of a variety of tasks and the ever present risk of hazards                                   
and calamity. Flight Deck Control (FDC) is where the deck scene is coordinated and during                             
flight operations it's one of the busiest places on the ship. The deck handlers in FDC send                                 
instructions to the aircraft directors on the flight deck who manage all aircraft movement,                           
placement and maintenance for the  deck regions they are responsible for.  
FDC is filled with computer screens and video displays of all that is occurring outside on                               
deck, but it is also home to one of the most crucial pieces of equipment in the Navy, the Ouija                                       
board (Figure 1). The Ouija board is a waist­high replica of the flight deck at 1/16 scale that has                                     
all the markings of the flight deck, as well as its full compliment of aircraft — all in cutout                                     
models, and all tagged with items like thumbtacks and bolts to designate their status. The board                               
offers an immediate glimpse of the deck status and allows the deck handlers in charge the ability                                 
to manipulate the model deck objects and make planning decisions, should the need arise. The                             
board has been in use since World War II and has provided a platform of collaboration for deck                                   
handlers in terms of strategy planning for various scenarios on deck. 
It is widely understood that the first round of damage to a ship will likely take out the                                   
electronics; so to ensure the ship remains functional in battle, everything possible has a                           
14 
mechanical backup. Even though the traditional board has an advantage of being immune to                           
electronic failures, there is potential for digital Ouija board technology to enhance the                         
deck­operation­planning functionality and experience. 
 
Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images. 
1.2.2.  Naval Push for Digital Information on Decks 
Even though the Ouija board has been used to track aircraft movement on aircraft carriers                             
for over seventy years, the Navy is working on a computerized replacement due to limitations of                               
the current model. As one of the simplest systems aboard Navy ships, the Ouija boards can only                                 
be updated manually, i.e. when the deck handlers move models of aircraft and other assets                             
around the model deck to match the movements of the real­life counterparts. The board does not                               
offer any task automation, information processing or validation to help with strategy planning for                           
various deck scenarios.  
 
15 
 
Figure 2: The ADMACS Ouija board. Source: Google Images. 
The new Ouija board replacement (Figure 2) is part of the Aviation Data Management                           
and Control System (ADMACS) [2], a set of electronic upgrades for carriers designed to make                             
use of the latest technologies. This system requires the deck handler to track flight deck activity                               
via computer, working with a monitor that will be fed data directly from the flight deck. In                                 
addition, the deck handler can move aircraft around on the simulated deck view using mouse and                               
keyboard.  
1.2.3. A Multimodal Ouija Board 
The ADMACS Ouija board fixes the problem of updating the deck status in real­time                           
without any manual work. It also allows the deck handlers to move aircraft on the simulated deck                                 
view using mouse and keyboard as noted. However, most deck handlers are apparently skeptical                           
of replacing the existing system and they think that things that are not broken should not be fixed                                   
[6]. Considering these facts, imagine a new Ouija board with a large digital tabletop display that                               
could show the status of the deck and had an understanding of certain deck actions for scenario                                 
planning. To preserve the conventional way of interacting with the old­school Ouija board where                           
16 
deck handlers move aircraft by hand, the system would take advantage of multiple modes of                             
interaction. Utilizing hand­tracking and speech recognition techniques, the system could let deck                       
handlers point at objects on deck and speak their commands. In return, the system could respond                               
with its own synthesized speech and update the graphics to illustrate the consequences of the                             
commands given by the deck handlers. This would create a two­way communication between the                           
system and the deck handlers.   
1.3. System Demonstration 
To demonstrate how the multimodal Ouija Board discussed in Section 1.2.3 works in                         
practice and preview DeckAssistant in action, we take a look at an example scenario from [1]                               
where a deck handler is trying to prepare an aircraft for launch on a catapult. The deck handler                                   
needs to move the aircraft­to­be­launched to the catapult while moving other aircraft that are                           
blocking the way to other locations on deck. 
The system has a large tabletop display showing a digital, realistic rendering of an                           
aircraft carrier deck with a complete set of aircraft (Figure 3).  
 
Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1]. 
17 
The deck handler stands in front of the table and issues commands using both hand                             
gestures and speech (Figure 4). DeckAssistant uses either the Leap Motion Sensor (mounted on                           
the edge of the display) or the Microsoft Kinect (mounted above the display) for hand­tracking.                             
The deck handler wears a wireless Bluetooth headset that supports a two­way conversation with                           
the system through speech. 
 
Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1].  
Figure 5 shows the initial aircraft arrangement of the deck. There are eleven F­18s (grey                             
strike fighter jets) and two C­2s (white cargo aircraft) placed on the deck. There are four                               
catapults at the front of the deck, and two of them are open. The deck handler will now try to                                       
launch one of the C­2s on one of the open catapults, and that requires moving a C­2 from the                                     
elevator, which is at the rear of the deck, to an open catapult, which is at the front of the deck. 
After viewing the initial arrangement of the deck, the deck handler points at the aircraft to                               
be moved, the lower C­2, and speaks the following command: “Move this C­2 to launch on                               
Catapult 2”. The display shows where the deck handler is pointing at with an orange dot, and the                                   
selected aircraft is highlighted in green (Figure 6). 
18 
 
Figure 5: The initial arrangement of the deck [1].  
 
Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1]. 
Now, DeckAssistant does its analysis to figure out whether the command given by the                           
deck handler can be accomplished without any extra action. In this case, there is an F­18                               
blocking the path the C­2 needs to take to go to the catapult (Figure 7).   
 
Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is blocked [1].  
19 
DeckAssistant knows that the F­18 has to be moved out of the way. It uses graphics and                                 
synthesized speech to let the deck handler know that additional actions are needed to be taken                               
and ask for the handler’s permission in the form of a yes­no question (Figure 8).  
 
Figure 8: DeckAssistant displays an alternate location for the F­18 that is blocking the path [1].  
The aircraft are moved in the simulation if the deck handler agrees to the actions                             
proposed by the system. If not, the system reverts back to the state before the command. If the                                   
deck handler does not like the action proposed by the system, they can cancel the command and                                 
move aircraft around based on their own strategies. The goal of DeckAssistant here is to take                               
care of small details while the deck handler focuses the more important deck operations without                             
wasting time.  
1.4. Thesis Outline 
In the next section, we talk about what type of actions are available in DeckAssistant and                               
how they are taken, what the system knows about the deck environment, and how the                             
multimodal interaction works. Section 3 discusses the hardware and software used as well as                           
introducing the software architecture behind DeckAssistant. Sections 4 and 5 look at                       
implementation details discussing hand­tracking, speech synthesis and recognition. Section 6                   
talks about related work. Section 7 discusses future work and concludes. 
 
 
 
20 
2. DeckAssistant Functionality 
This section gives an overview of actions available in DeckAssistant, discusses what                       
DeckAssistant knows about the deck environment and the objects, and explains how the                         
multimodal interaction happens.  
2.1. Actions in DeckAssistant 
The initial version of DeckAssistant focuses only on simple deck actions for aircraft                         
movement and placement. These actions that allow deck handlers to perform tasks such as                           
moving an aircraft from one location to another or preparing an aircraft for launch on a catapult.                                 
These deck actions comprise the logic to perform a command given by the deck handler (Figure                               
9). As the example in Section 1.3 suggests, these actions are built to be flexible and interactive.                                 
This means that the deck handler is always consulted for their input during an action, they can                                 
make alterations with additional commands, or they can suggest alternate actions if needed. The                           
system takes care of the details, saving the deck handler’s time and allowing them to concentrate                               
on more important tasks.  
These are four actions available within DeckAssistant, as noted in [1]:  
● Moving aircraft from start to destination. 
● Finding an alternate location for aircraft to move if the intended destination is full. 
● Clearing a path for aircraft to move from start to end location. 
● Moving aircraft to launch on catapults. 
21 
 
Figure 9: The logic for moving aircraft [1]. 
2.2. Deck Environment 
DeckAssistant has an understanding of the deck environment, which includes various                     
types of aircraft, regions on deck and paths between regions (See Chapter 4 of [1] for the                                 
implementation details of the deck environment and objects).  
2.2.1. Deck And Space Understanding 
DeckAssistant’s user interface represents a scale model of a real deck just like a                           
traditional Ouija Board. The system displays the status of aircraft on this user interface and use                               
the same naming scheme that the deck handlers use for particular regions of the deck (Figure                               
10). The deck handlers can thus refer to those regions by their names when using the system.                                 
Each of these regions contain a set of parking spots in which the aircraft can reside. These                                 
parking spots help the system determine the arrangement of parked aircraft and figure out the                             
22 
occupancy in a region. This means that the system knows if a region has enough room to move                                   
aircraft to or if the path from one region to another is clear. 
 
Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images.  
2.2.2. Aircraft and Destination Selection 
Each aircraft on deck is a unique object that has a tail number (displayed on each                               
aircraft), type, position, status and other information that is useful for the system’s simulation.                           
Currently, we support two different types of aircraft within DeckAssistant: F­18s and C­2s.  
Selection of aircraft can be done two ways. The deck handler can either point at the                               
aircraft (single or multiple) as shown in the example in Section 1.3, or, they can refer to the                                   
aircraft by their tail numbers, for instance, “Aircraft Number­8”.  
Destination selection is similar. Since destinations are regions on the deck, they can be                           
referred to by their names or they can be pointed at. 
2.2.3. Path Calculation and Rerouting 
During path planning, the system draws straight lines between regions and uses the                         
wingspan length as the width of the path to make sure that there are no aircraft blocking the way                                     
and that the aircraft to move can fit into its path.  
23 
If a path is clear but the destination does not have enough open parking spots, the system                                 
suggests alternate destinations and routes, checking the nearest neighboring regions for open                       
spots. 
2.3. Multimodal Interaction 
The goal of the multimodal interaction created by DeckAssistant’s user interface is to                         
create a communication between the deck handler and the system. The input in this interaction is                               
a combination of hand gestures and speech performed by the deck handler. The output is the                               
system’s response with synthesized speech and graphical updates.  
2.3.1. Input 
DeckAssistant uses either the Leap Motion Sensor or the Microsoft Kinect for tracking                         
hands. Hand­tracking allows the system to recognize certain gestures using the position of the                           
hands and fingertips. Currently, the system can only interpret pointing gestures where the deck                           
handler points at aircraft or regions on the deck.  
Commands are spoken into the microphone of the ​wireless Bluetooth headset that the                         
deck handler wears, allowing the deck handler to issue a command using speech alone. In this                               
case, the deck handler has to provide the tail number of the aircraft to be moved as well as the                                       
destination name. An example could be: “Move ​Aircraft Number­8 to the ​Fantail​”.                       
Alternatively, the deck handler can combine speech with one or more pointing gestures. In this                             
case, for example, the deck handler can point at an aircraft to be moved and say “Move ​this                                   
aircraft”; and then he can point at the destination and say “​over there​”.  
2.3.2. Output 
The system is very responsive to any input. As soon as the deck handler does a pointing  
24 
gesture, an orange dot appears on the screen, indicating where the deck handler is                           
pointing at (Figure 11 (a)). If the deck handler is pointing at an aircraft, the system highlights                                 
that aircraft with a green color, indicating a potential for selection (Figure 11 (b)). Eventually, if                               
the deck handler takes an action to move aircraft on deck, the selected aircraft are highlighted in                                 
orange. As mentioned earlier, the deck handler can select multiple aircraft (Figure 12). 
      
                                                  (a)   (b) 
Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered over is 
highlighted green [1]. 
    
                                               (a)   (b) 
Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1]. 
The system’s responses to the deck handler’s input depend on the type of action and the                               
aircraft arrangement on deck. If a certain action can be processed without additional actions, the                             
25 
system completes it and confirms it by saying “Okay, done”. If the action cannot be completed                               
for any reason, the system explains why using its synthesized speech and graphical updates, and                             
asks for the deck handler’s permission to take an alternate action. In the case of deck handler                                 
approval, the system updates the arrangement on deck. The deck handler declines the suggested                           
alternate action, the system reverts back to its previous state before the deck handler issued their                               
command. 
Section 1.3 gave us an example of this scenario where the system warned the user of the                                 
aircraft that was blocking the path to a catapult and it recommended an alternate spot to move the                                   
aircraft blocking the way. When the deck handler approved, then it could move the aircraft to                               
launch on the catapult. 
Let’s take a look at another scenario. Figure 13 shows an example of a situation where a                                 
C­2 cannot be moved to the fantail since there are no open parking spots there. The system                                 
circles all the blocking aircraft in red, and suggests an alternate region on deck to move the C­2.                                   
In that case, the new region is highlighted in blue and a clear path to it is drawn (Figure 14). If                                         
the deck handler accepts this suggested region, the system moves the C­2 there. If not, it reverts                                 
back to its original state and waits for new commands.  
 
Figure 13: Aircraft circled in red, meaning there is not enough room in region [1]. 
26 
 
Figure 14: Alternate region to move the C­2 is highlighted in blue [1]. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27 
3. System Implementation 
In this section, we introduce DeckAssistant’s hardware setup, the software libraries used                       
and the software architecture design. 
3.1. Hardware 
 
Figure 15: The hardware used in DeckAssistant. 
As it can be seen in Figure 15, DeckAssistant’s hardware setup consists of: 
● Four downward­facing Dell 5100MP projectors mounted over the tabletop. These                   
projectors create a 42 by 32 inch seamless display with a 2800 x 2100 pixel resolution. 
28 
● A white surface digitizer. The display is projected onto this surface. 
● A Leap Motion Sensor or a Microsoft Kinect (V1) for tracking hands over the table                             
surface. The system can use either sensor. 
● A Logitech C920 Webcam for viewing the entire surface. This webcam is used to                           
calibrate the seamless display using the ​ScalableDesktop Classic​ software. 
● A wireless Bluetooth headset for supporting a two­way conversation with the system. 
This setup is powered by a Windows 7 desktop computer with an AMD Radeon HD 6870                               
graphics card. It should be noted that the need for the surface digitizer, projectors and webcam                               
would be eliminated if the system was configured to use a flat panel for the display.  
3.2. Software  
All of DeckAssistant’s code is written in Java 7 in the form of a stand­alone application.                               
This application handles all the system functionality: graphics, speech recognition, speech                     
synthesis, and gesture recognition. 
3.2.1. Libraries 
Four libraries are used to provide the desired functionality: 
● Processing: for graphics;  it is a fundamental part of our application framework. 
● AT&T Java Codekit: for speech recognition. 
● Microsoft Translator Java API: for speech synthesis. 
● Leap Motion Java SDK: provides the interface to the Leap Motion Controller sensor for                           
hand­tracking. 
 
 
29 
3.2.2. Architecture 
DeckAssistant’s software architecture is structured around three stacks that handle the                     
multimodal input and output. These three stacks run in parallel and are responsible for speech                             
synthesis, speech recognition and hand­tracking. The Speech Synthesis Stack constructs                   
sentences in response to a deck handler’s command and generates an audio file for that sentence                               
that is played through the system’s speakers. The Speech Recognition Stack constantly listens for                           
commands, does speech­to­text conversion and parses the text to figure out the command that                           
was issued. The Hand­Tracking Stack interfaces either with the Leap Motion Sensor or the                           
Microsoft Kinect, processes the data received and calculates the position of the user’s pointing                           
finger over the display as well as detecting additional gestures. These three stacks each provide                             
an API (Application Program Interface) so that the other components within DeckAssistant can                         
communicate with them for a multimodal interaction.  
Another crucial part of the architecture is the Action Manager component. The Action                         
Manager’s job is to manipulate the deck by communicating with the three multimodal interaction                           
stacks. Once a deck handler’s command is interpreted, it is passed into the Action Manager                             
which updates the deck state and objects based on the command and responds by leveraging the                               
Speech Synthesis Stack and graphics.  
Finally, all of these stacks and components run on a Processing loop that executes every                             
30 milliseconds. Each execution of this loop makes sure the multimodal input and output are                             
processed. Figure 16 summarizes the software architecture. The ​DeckAssistant Software Guide                     
(see Appendix for URL) details the implementation of each component within the system.  
30 
 
Figure 16: DeckAssistant software architecture overview. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31 
4. Hand Tracking 
In Chapter 5 of his thesis [1], Kojo Acquah discusses methods for tracking hands and                             
recognizing pointing gestures using a Microsoft Kinect (V1). These initial hand­tracking                     
methods of DeckAssistant can only recognize outstretched fingers on hands that are held mostly                           
perpendicular to the focal plane of the camera. They do not work well with other hand poses,                                 
leaving no way to recognize other gestures. Authors of [8] provide a detailed analysis of the                               
accuracy and resolution of the Kinect sensor’s depth data. Their experimental results show that                           
the random error in depth measurement increases with increasing distance to the sensor, ranging                           
from a few millimeters to approximately 4 centimeters at the maximum range of the sensor. The                               
quality of the data is also found to be affected by the low resolution of the depth measurements                                   
that depend on the frame rate (30fps [7]). The authors thus suggest that the obtained accuracy, in                                 
general, is sufficient for detecting arm and body gestures, but is not sufficient for precise finger                               
tracking and hand gestures. Experimenting with DeckAssistant’s initial version to take certain                       
actions, we note a laggy and low­accuracy hand­tracking performance by the Kinect sensor. In                           
addition, the Kinect always has to be calibrated before DeckAssistant can be used. This is a                               
time­consuming process. Finally, the current setup has a usability problem; when deck handlers                         
stand in front of the tabletop and point at the aircraft on the display, their hands block the                                   
projectors’ lights causing shadows in the display.  
Authors of [9] present a study of the accuracy and robustness of the Leap Motion Sensor.                               
They use an industrial robot with a reference pen allowing suitable position accuracy for the                             
experiment. Their results show high precision (an overall average accuracy of 0.7mm) in                         
fingertip position detection. Even though they do not achieve the accuracy of 0.01mm, as stated                             
32 
by the manufacturer [3], they claim that the Leap Motion Sensor performs better than the                             
Microsoft Kinect in the same experiment.  
This section describes our use of the Leap Motion Sensor, to track hands and recognize                             
gestures, allowing for a high­degree of subjective robustness.  
4.1. The Leap Motion Sensor 
The Leap Motion Sensor is a 3” long USB device that tracks hand and finger motions. It                                 
works by projecting infrared light upward from the device and detecting reflections using                         
monochromatic infrared cameras. Its field of view extends from 25mm to 600mm above the                           
device with a 150° spread and a high frame rate (>200fps) [3]. In addition, more information                               
about the hands is provided by the Application Programming Interface (API) of the Leap Motion                             
Sensor than the Microsoft Kinect’s (V1). 
 
Figure 17: The Leap Motion Sensor mounted on the edge of the tabletop display. 
The Leap Motion Sensor is mounted on the edge of the tabletop display, as shown above                               
in Figure 17. In this position, hands no longer block the projector’s lights, thereby eliminating                             
33 
the shadows in the display. The sensor also removes the need for calibration before use, enabling                               
DeckAssistant to run without any extra work. Finally, thanks to its accuracy in finger­tracking,                           
the sensor creates the opportunity for more hand gestures to express detail in deck actions (see                               
Section 4.1.2).  
4.1.1. Pointing Detection 
The Leap Motion API provides us with motion tracking data as a series of frames. Each                               
frame contains measured positions and other information about detected entities. Since we are                         
interested in detecting pointing, we look at the fingers. The ​Pointableclass in the API reports                               
the physical characteristics of detected extended fingers such as tip position, direction, etc. From                           
these extended fingers, we choose the pointing finger as the one that is farthest toward the front                                 
in the standard Leap Motion frame of reference. Once we have the pointing finger, we retrieve its                                 
tip position by calling the ​Pointable class’ ​stabilizedTipPosition() method. This                   
method applies smoothing and stabilization on the tip position, removing the the flickering                         
caused by sudden hand movements and yielding a more accurate pointing detection that                         
improves the interaction with our 2D visual content. The stabilized tip position lags behind the                             
original tip position by a variable amount (not specified by the manufacturer) [3] depending on                             
the speed of movement. 
Finally, we map the tip position from the Leap Motion coordinate system to our system’s                             
2D display. For this, we use the API class ​InteractionBox​. This class represents a                           
cuboid­shaped region contained in the Leap Motion’s field of view (Figure 18). The                         
InteractionBoxprovides normalized coordinates for detected entities within itself. Calling                   
the ​normalizePoint()method of this class returns the normalized 3D coordinates for the tip                           
34 
position within the range [0...1]. Multiplying the X and Y components of these normalized                           
coordinates by the our system’s screen dimensions, we complete the mapping process and obtain                           
the 2D coordinates in our display. Algorithm 1 summarizes the pointing detection process. 
   
Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer Portal. 
 
Algorithm 1: Summary of the pointing detection process in pseudocode.  
As discussed in Section 2.3.2, the mapped tip position is displayed on the screen as an                               
orange dot.  
4.1.2. Gesture Detection 
We implemented a new gesture for multiple aircraft selection, using a combination of                         
pointing and pinching. The deck handler can point with their index finger while pinching with                             
35 
their thumb and middle finger to select multiple aircraft. We detect this gesture using the Leap                               
Motion API’s ​pinchStrength()method. If the deck handler is pinching, the value returned                         
by this method is 1, and 0 otherwise. However, since this value can be affected by movements of                                   
the deck handler’s hand due to the device’s sensitivity, we apply a moving average method to                               
make sure that the majority of the values we receive from the method indicate pinching. In                               
addition, we recognize this gesture only if the user is pinching with the thumb and the middle                                 
finger. We do this by iterating through the list of fingers in a frame and checking the distance                                   
between their tip positions and the thumb’s tip position. The middle finger’s tip position in this                               
case is supposed to have the smallest distance to the thumb’s tip position. The reason for this                                 
check is that we do not want to recognize other hand poses as a pinch gesture. For example, if                                     
the deck handler is pointing with their index finger and the other fingers are not extended, the                                 
system might think that the user is pinching. However, that is not the case and the check we run                                     
along with the moving average applied pinch strength value prevents the recognition of such                           
cases. Figure 19 shows an example of multiple aircraft selection using the pinch gesture.  
 
Figure 19: Demonstration of multiple aircraft selection with the pinch gesture. 
36 
5. Speech Synthesis and Recognition 
This section details the improvements in Speech Synthesis and Speech Recognition for                       
DeckAssistant. 
5.1. Speech Synthesis 
The initial version of DeckAssistant, as discussed in [1, Section 6.1], used the FreeTTS                           
package for speech synthesis. Even though FreeTTS provides an easy­to­use API and is                         
compatible with many operating systems, it lacks pronunciation quality and clarity in speech. To                           
solve this problem, we implemented a speech synthesizer interface that acts as a front to any                               
speech synthesis library that we plug in. One library that works successfully with our system is                               
the Microsoft Translator API, a cloud­based automatic, machine translation service that supports                       
multiple languages. Since our application uses the English language, we do not use any of the                               
translation features of the service. Instead, we use it to generate a speech file from the text we                                   
feed in.  
As explained in Section 3.2.2, speech is synthesized in response to a deck handler’s                           
commands. Any module in the software can call the Speech Synthesis Engine of the Speech                             
Synthesis Stack to generate speech. Once called, the Speech Synthesis Engine feeds the text to                             
be spoken into the Microsoft Translator API through the interface we created. The Microsoft                           
Translator API then makes a request to the Microsoft Translator API which returns a WAV file                               
that we play through our system’s speakers. In the case of multiple speech synthesis requests, the                               
system queues these requests and handles them in order. Using the Microsoft Translator API                           
enables us to provide high­quality speech synthesis with clear voices. It should be noted that the                               
37 
future developers of DeckAssistant can incorporate any speech synthesis library into the system                         
with ease.  
5.2. Speech Recognition 
The CMU Sphinx 4 library is used for recognizing speech in the initial version of                             
DeckAssistant [1, Section 6.2]. Even though Sphinx provides an easy API to convert speech into                             
text with acoustic models and a grammar (rules for specific phrase construction) of our choice,                             
the speech recognition performance is poor in terms of recognition speed and accuracy. In the                             
experiments we ran during development, we ended up repeating ourselves several times until the                           
recognizer picked up what we were saying. In response, we introduced a speech recognizer                           
interface that provides us with the flexibility to use any speech recognition library. Other                           
modules in DeckAssistant can call this interface and use the recognized speech as needed.  
5.2.1. Recording Sound 
The user can talk to DeckAssistant at any time, without the need for extra actions such as                                 
push­to­talk or gestures. For this reason, the system should constantly be recording using the                           
microphone, understanding when the user is done issuing a command, and generating a WAV                           
file of the spoken command. Sphinx’s Live Speech Recognizer took care of this by default.                             
However, since the speech recognizer library we decided to use (discussed in the next section)                             
did not provide any live speech recognition, we had to implement our own sound recorder that                               
generates WAV files with the spoken commands. For this task, we use SoX (Sound Exchange), a                               
cross­platform command line utility that can record audio files and process them. The SoX                           
command constantly runs in the background to record any sound. It stops recording once no                             
sound is detected after the user has started speaking. It then trims out certain noise bursts and                                 
38 
writes the recorded speech to a WAV file which is sent back to DeckAssistant. Once the speech                                 
recognizer is done with the speech­to­text operation, this background process is run again to                           
record new commands. For more details about SoX, please refer to the SoX Documentation [4]. 
5.2.2. Choosing a Speech Recognition Library 
To pick the most suitable speech recognition library for our needs, we experimented with                           
four popular APIs: 
● Google Speech: It did not provide an official API. We had to send an HTTP request to                                 
their service with a recorded WAV file to get the speech­to­text response, and were                           
limited to 50 requests per day. Even though the responses for random sentences that we                             
used for testing were accurate, it did not work very well for our own grammar since the                                 
library does not provide any grammar configuration. A simple example could be the                         
sentence “Move this C­2”. The recognizer thought that we were saying “Move this see                           
too”. Since we had a lot of similar issues with other commands, we decided not to use                                 
this library.  
● IBM Watson Speech API: Brand new, easy­to­use API. It transcribed the incoming audio                         
and sent it back to our system with minimal delay, and speech recognition seemed to                             
improve as it heard more. However, like Google Speech, it did not provide any grammar                             
configuration which caused inaccuracy in recognizing certain commands in our system.                     
Therefore, we did not use this library. 
● Alexa Voice Service: Amazon recently made this service available. Even though the                       
speech recognition works well for the purposes it was designed for, it unfortunately                         
cannot be used as a pure speech­to­text service. Instead of returning the text spoken, the                             
39 
service returns an audio file with a response which is not useful for us. After hacking                               
with the service, we managed to extract the text that was transcribed from the audio file                               
we sent in. However, it turns out that the Alexa Voice Service can only be used when the                                   
user says the words “Alexa, tell DeckAssistant to…” before issuing a command. That is                           
not very usable for our purposes, so we choose not to work with this service.  
● AT&T Speech: This system allowed us to configure a vocabulary and a grammar that                           
made the speech recognition of our specific commands very accurate. Like the IBM                         
Watson Speech API, the transcription of the audio file we sent in was returned with                             
minimal delay. Therefore, we ended up using this library for our speech recognizer. The                           
one downside of this library was that we had to pay a fee to receive Premium Access for                                   1
the Speech API.  
As explained in Section 5.2.1, recognition is performed after each spoken command                       
followed by a brief period of silence. Once the AT&T Speech library recognizes a phrase in our                                 
grammar, we pass the transcribed text into our parser. 
5.2.3. Parsing Speech Commands 
The parser extracts metadata that represents the type of the command being issued as well                             
as any other relevant information. Each transcribed text that is sent to the parser is called a base                                   
command. Out of all the base commands, only the Decision Command (Table 1) represents a                             
meaningful action by itself. The parser interprets the rest of the commands in two stages, which                               
allows for gestural input alongside speech. We call these combined commands. Let’s look at an                             
example where we have the command “Move this aircraft, over there”. When issuing this                           
1
 AT&T Developer Premium Access costs $99.  
40 
command, the deck handler points at the aircraft to be moved and says “Move this aircraft...”,                               
followed by “...over there” while pointing at the destination. In the meantime, the parser sends                             
the metadata extracted from the text to the Action Manager, which holds the information until                             
two base commands can be combined into a single command for an action to be taken. In                                 
addition, the Action Manager provides visual and auditory feedback to the deck handler during                           
the process. A full breakdown of speech commands are found in [1] and listed here: 
Base Commands 
Name  Function  Example(s) 
Move Command  Selects aircraft to be moved.  “Move this C­2…” 
Location Command  Selects destination of move.  “…to the fantail.” 
Launch Command  Selects catapult(s) to launch 
aircraft on. 
“…to launch on Catapult 2.” 
Decision Command  Responds to a question from 
DeckAssistant 
“Yes”, “No”, “Okay”. 
 
Combined Commands 
Name  Function  Combination 
Move to Location Command  Moves aircraft to a specified 
destination. 
Move Command + Location 
Command 
Move Aircraft to Launch 
Command 
Moves aircraft to launch on one 
or more catapults. 
Move Command + Launch 
Command 
Table 1: Set of commands that are recognized by DeckAssistant.  
5.2.4. Speech Recognition Stack in Action 
In Figure 20, we outline how the Speech Recognition Stack works with the Action                           
Manager to create deck actions. As already discussed in Section 5.2.1, the SoX process that we                               
run is constantly recording and waiting for commands. Figure 20 uses a command that moves an                               
41 
aircraft to a deck region as an example. When the deck handler issues the first command, the                                 
SoX process sends the speech recognizer a WAV file to transcribe. The transcribed text is then                               
sent to the speech parser which extracts the metadata. Once the speech recognizer is done                             
transcribing, it restarts the recording of sound through the SoX Command to listen for future                             
commands. Step 1 on Figure 20 shows that the metadata extracted represents a Move Command                             
for an aircraft that is being pointed at. The Action Manager receives this information at Step 2,                                 
understands that it is a base command and it waits for another command to combine them into a                                   
single command that represents a deck action. In the meantime, the Action Manager consults the                             
Selection Engine at Step 3 to get the information for the aircraft that is being pointed at. This                                   
allows the Action Manager highlight the aircraft that is selected. Meanwhile, the deck handler                           
speaks the rest of the command, which is sent to the parser. Step 4 shows the metadata that is                                     
assigned to the base command spoken. In this case, we have a Location Command and the name                                 
of the deck region which is the destination. In Step 5, the Action Manager constructs the final                                 
command with the second base command, and it fetches the destination information through the                           
Deck Object. Finally, a Deck Action is created (Step 7) with the information gathered from the                               
Speech Recognition Stack and other modules.  
Implementation of Deck Actions is described in [1, Section 7].  
 
42 
 
Figure 20: A summary of how the speech recognition stack works. 
 
 
 
 
43 
6. Related Work 
This section presents the work done previously that inspired the DeckAssistant project.  
6.1. Navy ADMACS 
As mentioned in Section 1.2.2, the Navy is moving towards a more technologically                         
developed and connected system called ​ADMACS that is a real­time data management system                         
connecting the carrier's air department, ship divisions and sailors who manage aircraft launch and                           
recovery operations.  
6.2. Deck Heuristic Action Planner 
Ryan et al. have developed ‘a decision support system for flight deck operations that                           
utilizes a conventional integer linear program­based planning algorithm’ [5]. In this system, a                         
human operator inputs the end goals as well as constraints, and the algorithm returns a proposed                               
schedule of operations for the operator’s approval. Even though their experiments showed that                         
human heuristics perform better than the plans produced by the algorithm, human decisions are                           
usually conservative and the system can offer alternate plans. This is an early attempt to aid                               
planning on aircraft carriers. 
 
 
 
 
 
 
 
 
 
 
 
44 
7. Conclusion 
In this thesis, we introduced improvements to DeckAssistant, a system that provides a                         
traditional Ouija board interface by displaying a digital rendering of an aircraft carrier deck that                             
assists deck handlers in planning deck operations. DeckAssistant has a large digital tabletop                         
display that shows the status of the deck and has an understanding of certain deck actions for                                 
scenario planning. To preserve the conventional way of interacting with the old­school Ouija                         
board where deck handlers move aircraft by hand, the system takes advantage of multiple modes                             
of interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the                             
system. The system responds with its own speech and updates the display to show the                             
consequences of the actions taken by the handlers. The system can also be used to simulate                               
certain scenarios during the planning process. The multimodal interaction described here creates                       
a communication of sorts between deck handlers and the system. 
Our work includes three improvements to the initial version of DeckAssistant built by                         
Kojo Acquah [1]. First is the introduction of the Leap Motion Sensor for pointing detection and                               
gesture recognition. We presented our subjective opinions on why the Leap Motion device                         
performs better than the Microsoft Kinect and we explain how we achieve pointing detection and                             
gesture recognition using the device. The second improvement is better speech synthesis from                         
our introduction of a new speech synthesis library that provides high­quality pronunciation and                         
clarity in speech. The third improvement is better speech recognition. We discuss the use cases                             
of several speech recognition libraries and figure out which one is the best for our purposes. We                                 
explain how to integrate this new library into the current system with our own methods of                               
recording voice.  
45 
7.1. Future Work 
While the current version of DeckAssistant focuses only on aircraft movement based on                         
deck handler actions, future versions may be able to implement algorithms where the system can                             
simulate the most optimal ordering of operations for an end goal, while accounting for deck and                               
aircraft status such as maintenance needs.  
Currently, DeckAssistant’s display that is created by the four downward­facing projectors                     
mounted over the tabletop (discussed in Section 3.1) has a high pixel resolution. However, it is                               
not as seamless as it should be. The ScalableDesktop software is being used to accomplish an                               
automatic edge­blending of the four displays, however the regions where the projectors overlap                         
are still visible. Moreover, the ScalableDesktop software has to be run for calibration every time                             
a user tries to start DeckViewer, and the brightness of the display is low. Instead of the projectors                                   
and the tabletop surface, a high­resolution, touchscreen LED TV might be mounted flat on a                             
table. This would provide a seamless display free of projector overlaps and remove the need for                               
time­consuming calibration. In addition, with the touchscreen feature, we can introduce drawing                       
gestures where the deck handler can draw out the aircraft movement as well as take notes on the                                   
screen.  
 
 
 
 
 
 
 
 
 
 
 
46 
8. References 
[1] Kojo Acquah. ​Towards a Multimodal Ouija Board for Aircraft Carrier Deck Operations. 
June 2015. 
 
[2]  US Navy Air Systems Command. ​Navy Training System Plan for Aviation Data 
Management and Control System​. March 2002. 
 
[3] The Leap Motion Sensor. ​Leap Motion for Mac and PC​. November 2015.  
 
[4]  SoX Documentation. ​http://sox.sourceforge.net/Docs/Documentation​. February 2013. 
 
[5]  Ryan et al. ​Comparing the Performance of Expert User Heuristics and an Integer Linear 
Program in Aircraft Carrier Deck Operations. ​2013.  
 
[6] Ziezulewicz, Geoff. "Old­school 'Ouija Board' Being Phased out on Navy Carriers." ​Stars 
and Stripes​. Stars and Stripes, 10 Aug. 2011. Web. 03 Mar. 2016. 
 
[7] Microsoft. ​Kinect for Windows Sensor Components and Specifications​. Web. 07 Mar. 2016. 
 
[8] Khoshelham, K.; Elberink, S.O. Accuracy and resolution of kinect depth data for indoor                           
mapping applications. Sensors 2012, 12, 1437–1454. 
 
[9] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. Analysis of the accuracy and robustness                             
of the leap motion controller. Sensors 2013, 13, 6380–6393. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47 
9. Appendix 
9.1. Code and Documentation 
The source code of DeckAssistant, documentation on how to get up and running with the                             
system, and the ​DeckAssistant Software Guide ​is available on GitHub:                   
https://github.mit.edu/MUG­CSAIL/DeckViewer​. 
 
48 

More Related Content

What's hot

The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...
The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...
The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...Alex Vaqué
 
Dell Data Migration A Technical White Paper
Dell Data Migration  A Technical White PaperDell Data Migration  A Technical White Paper
Dell Data Migration A Technical White Papernomanc
 
Thesis:"DLAlert and Information Alert System for Digital Libraries"
Thesis:"DLAlert and Information Alert System for Digital Libraries"Thesis:"DLAlert and Information Alert System for Digital Libraries"
Thesis:"DLAlert and Information Alert System for Digital Libraries"Ioannis Alexakis
 
Transforming a Paper-Based Library System to Digital in Example of Herat Univ...
Transforming a Paper-Based Library System to Digital in Example of Herat Univ...Transforming a Paper-Based Library System to Digital in Example of Herat Univ...
Transforming a Paper-Based Library System to Digital in Example of Herat Univ...Abdul Rahman Sherzad
 
Android Face Recognition App Locker
Android Face Recognition App LockerAndroid Face Recognition App Locker
Android Face Recognition App LockerAnkur Mogra
 
Quick testprofessional book_preview
Quick testprofessional book_previewQuick testprofessional book_preview
Quick testprofessional book_previewSaurabh Singh
 
EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...
EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...
EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...Power System Operation
 
Water Treatment Unit Selection, Sizing and Troubleshooting
Water Treatment Unit Selection, Sizing and Troubleshooting Water Treatment Unit Selection, Sizing and Troubleshooting
Water Treatment Unit Selection, Sizing and Troubleshooting Karl Kolmetz
 
Programming.clojure
Programming.clojureProgramming.clojure
Programming.clojureKwanzoo Dev
 

What's hot (10)

The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...
The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...
The Green Evolution of EMOTIVE Cloud EMOTIVE Cloud: The BSC’s IaaS open-sourc...
 
Dell Data Migration A Technical White Paper
Dell Data Migration  A Technical White PaperDell Data Migration  A Technical White Paper
Dell Data Migration A Technical White Paper
 
Thesis:"DLAlert and Information Alert System for Digital Libraries"
Thesis:"DLAlert and Information Alert System for Digital Libraries"Thesis:"DLAlert and Information Alert System for Digital Libraries"
Thesis:"DLAlert and Information Alert System for Digital Libraries"
 
Transforming a Paper-Based Library System to Digital in Example of Herat Univ...
Transforming a Paper-Based Library System to Digital in Example of Herat Univ...Transforming a Paper-Based Library System to Digital in Example of Herat Univ...
Transforming a Paper-Based Library System to Digital in Example of Herat Univ...
 
Android Face Recognition App Locker
Android Face Recognition App LockerAndroid Face Recognition App Locker
Android Face Recognition App Locker
 
Quick testprofessional book_preview
Quick testprofessional book_previewQuick testprofessional book_preview
Quick testprofessional book_preview
 
EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...
EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...
EXPERIENCE CONCERNING AVAILABILITY AND RELIABILITY OF DIGITAL SUBSTATION AUTO...
 
Thesis_Final_2013
Thesis_Final_2013Thesis_Final_2013
Thesis_Final_2013
 
Water Treatment Unit Selection, Sizing and Troubleshooting
Water Treatment Unit Selection, Sizing and Troubleshooting Water Treatment Unit Selection, Sizing and Troubleshooting
Water Treatment Unit Selection, Sizing and Troubleshooting
 
Programming.clojure
Programming.clojureProgramming.clojure
Programming.clojure
 

Similar to Thesis (20)

KHAN_FAHAD_FL14
KHAN_FAHAD_FL14KHAN_FAHAD_FL14
KHAN_FAHAD_FL14
 
An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...
 
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFTCS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
 
thesis_online
thesis_onlinethesis_online
thesis_online
 
masteroppgave_larsbrusletto
masteroppgave_larsbruslettomasteroppgave_larsbrusletto
masteroppgave_larsbrusletto
 
DMDI
DMDIDMDI
DMDI
 
Master's Thesis
Master's ThesisMaster's Thesis
Master's Thesis
 
NIC Project Final Report
NIC Project Final ReportNIC Project Final Report
NIC Project Final Report
 
Wikis as water coolers?
Wikis as water coolers?Wikis as water coolers?
Wikis as water coolers?
 
Computing Science Dissertation
Computing Science DissertationComputing Science Dissertation
Computing Science Dissertation
 
Tfg ros-sis
Tfg ros-sisTfg ros-sis
Tfg ros-sis
 
Thesis
ThesisThesis
Thesis
 
Pw user guide
Pw user guidePw user guide
Pw user guide
 
Report final
Report finalReport final
Report final
 
GViz - Project Report
GViz - Project ReportGViz - Project Report
GViz - Project Report
 
Course Modules.pdf
Course Modules.pdfCourse Modules.pdf
Course Modules.pdf
 
PhD Thesis
PhD ThesisPhD Thesis
PhD Thesis
 
lernOS for You Guide (Version 1.4)
lernOS for You Guide (Version 1.4)lernOS for You Guide (Version 1.4)
lernOS for You Guide (Version 1.4)
 
1227201 Report
1227201 Report1227201 Report
1227201 Report
 
ANSYS Fluent - CFD Final year thesis
ANSYS Fluent - CFD Final year thesisANSYS Fluent - CFD Final year thesis
ANSYS Fluent - CFD Final year thesis
 

Thesis

  • 1. A Multimodal Ouija Board for Aircraft Carrier Deck Operations    by  Birkan Uzun  S.B., C.S. M.I.T., 2015      Submitted to the   Department of Electrical Engineering and Computer Science  in Partial Fulfillment of the Requirements for the Degree of   Master of Engineering in Computer Science and Engineering  at the  Massachusetts Institute of Technology  June 2016  Copyright 2016 Birkan Uzun. All rights reserved.    The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and  electronic copies of this thesis document in whole and in part in any medium now known or  hereafter created.       Author ……………………………………………………………………………………………...  Department of Electrical Engineering and Computer Science  April 6, 2016    Certified by ………………………………………………………………………………………...   Randall Davis, Professor  Thesis Supervisor    Accepted by ………………………………………………………………………………………..   Dr. Christopher J. Terman  Chairman, Masters of Engineering Thesis Committee          1 
  • 3. A Multimodal Ouija Board for Aircraft Carrier Deck Operations    by  Birkan Uzun    Submitted to the   Department of Electrical Engineering and Computer Science  April 6, 2016  in Partial Fulfillment of the Requirements for the Degree of   Master of Engineering in Computer Science and Engineering      Abstract  In this thesis, we present improvements to DeckAssistant, a system that provides a traditional                            Ouija board interface by displaying a digital rendering of an aircraft carrier deck that assists deck                                handlers in planning deck operations. DeckAssistant has a large digital tabletop display that                          shows the status of the deck and has an understanding of certain deck actions for scenario                                planning. To preserve the conventional way of interacting with the old­school Ouija board where                            deck handlers move aircraft by hand, the system takes advantage of multiple modes of                            interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the                            system. The system responds with its own speech and gestures, and it updates the display to                                show the consequences of the actions taken by the handlers. The system can also be used to                                  simulate certain scenarios during the planning process. The multimodal interaction described                      here creates a communication of sorts between deck handlers and the system. Our contributions                            include improvements in hand­tracking, speech synthesis and speech recognition.      3 
  • 5. Acknowledgements  Foremost, I would like to thank my advisor, Professor Randall Davis, for the support of                              my work, for his patience, motivation and knowledge. His door was always open whenever I had                                a question about my research. He consistently allowed this research to be my own work, but                                steered me in the right direction with his meaningful insights whenever he thought I needed it.   I would also like to thank Jake Barnwell for helping with the development environment                            setup and documentation.   Finally, I must express my gratitude to my parents and friends who supported me                            throughout my years of study. This accomplishment would never be possible without them.                                    5 
  • 7. Contents  1. Introduction……………………………………………………………………………..13  1.1. Overview…………………………………………………………………………13  1.2. Background and Motivation……………………………….…..….………..….....14  1.2.1. Ouija Board History and Use…………………………………………….14  1.2.2. Naval Push for Digital Information on Decks………………………....…15  1.2.3. A Multimodal Ouija Board………………………………………………16  1.3. System Demonstration………………………………………………………...…17  1.4. Thesis Outline……………………………………………………………………20  2. Deck Assistant Functionality…………………………………………………………..21  2.1. Actions in DeckAssistant……...………………………………………………....21  2.2. Deck Environment………...……………………………………………………..22  2.2.1. Deck and Space Understanding…....…………………………………….22  2.2.2. Aircraft and Destination Selection…………..…………………………...23  2.2.3. Path Calculation and Rerouting.…………………………………………23  2.3. Multimodal Interaction..…………………………………………………………24  2.3.1. Input………...……………………………………………………………24  2.3.2. Output………...………………………………………………………….24  3. System Implementation…….…………………………………………………………..28  3.1. Hardware………………….…...…………………………………………………28  3.2. Software……………….......……………………………………………………..29  3.2.1. Libraries……………………....…....…………………………………….29  7 
  • 8. 3.2.2. Architecture……………………...…………..…………………………...30  4. Hand Tracking...……………………….…….…..………………………......…..……..32  4.1. The Leap Motion Sensor…....……………………………………………………33  4.1.1. Pointing Detection………....……………………………....…………….34  4.1.2. Gesture Detection…………………………....…………………………...35  5. Speech Synthesis and Recognition……………………………………………………..37  5.1. Speech Synthesis……….……...…………………………………………………37  5.2. Speech Recognition…..…...…………………………………………………..…38  5.2.1. Recording Sound………………......……………………………………..38  5.2.2. Choosing a Speech Recognition Library..…..…………………………...39  5.2.3. Parsing Speech Commands…....…………………………………………40  5.2.4. Speech Recognition Stack in Action……………………………………..41  6. Related Work…………………….……………………………………………………..44  6.1. Navy ADMACS.……….……...…………………………………………………44  6.2. Deck Heuristic Action Planner……....…………………………………………..44  7. Conclusion….…………………….……………………………………………………..45  7.1. Future Work…...……….……...…………………………………………………46  8. References…..…………………….……………………………………………………..47  9. Appendix…....…………………….……………………………………………………..48  9.1. Code and Documentation....…...…………………………………………………48        8 
  • 9. List of Figures   Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images..15  Figure 2: The ADMACS Ouija board. Source: Google Images…………………………………16  Figure 3:  DeckAssistant’s tabletop display with the digital rendering of the deck [1]...........…..17  Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1]....18   Figure 5: The initial arrangement of the deck [1]..........................................................................19  Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].......19  Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is                                blocked [1].....................................................................................................................................19  Figure 8: DeckAssistant displays an alternate location for the F­18 that is blocking the path                              [1]...............................................………………………………………………………………....20  Figure 9:  The logic for moving aircraft [1]...................................................................................22  Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images....................................23  Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered                                over is highlighted green [1]..........................................................................................................25  Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1]......................................25  Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].....................26  Figure 14: Alternate region to move the C­2 is highlighted in blue [1]........................................27  Figure 15: The hardware used in DeckAssistant………………………………………………...28  Figure 16: DeckAssistant software architecture overview……………………………………....31  Figure 17: The Leap Motion Sensor mounted on the edge of the table top display……………..33  9 
  • 10. Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer                        Portal……………………………………………………………………………………………..35  Figure 19: Demonstration of multiple aircraft selection with the pinch gesture………………...36  Figure 20: A summary of how the speech recognition stack works……………………………..43                                10 
  • 13. 1. Introduction  1.1. Overview  In this thesis, we present improvements to DeckAssistant, a digital aircraft carrier Ouija                          Board interface that aids deck handlers with planning deck operations. DeckAssistant supports                        multiple modes of interaction, aiming to improve the user experience over the traditional Ouija                            Boards. Using hand­tracking, gesture recognition and speech recognition, it allows deck handlers                        to plan deck operations by pointing at aircraft, gesturing and talking to the system. It responds                                with its own speech using speech synthesis and updates the display, which is a digital rendering                                of the aircraft carrier deck, to show results when deck handlers take action. The multimodal                              interaction described here creates a communication of sorts between deck handlers and the                          system. DeckAssistant has an understanding of deck objects and operations, and can be used to                              simulate certain scenarios during the planning process.  The initial work on DeckAssistant was done by Kojo Acquah, and we build upon his                              implementation [1]. Our work makes the following contributions to the fields of                        Human­Computer Interaction and Intelligent User Interfaces:  ● It discusses how using the Leap Motion Sensor is an improvement over the Microsoft                            Kinect in terms of hand­tracking, pointing and gesture recognition.  ● It presents a speech synthesis API which generates speech that has high pronunciation                          quality and clarity. It investigates several speech recognition APIs, argues which one is                          the most applicable, and introduces a way of enabling voice­activated speech recognition.  13 
  • 14. ● Thanks to the refinements in hand­tracking and speech, it provides a natural, multimodal                          way of interaction with the first large­scale Ouija Board alternative that has been built to                              help with planning deck operations.  1.2. Background and Motivation  1.2.1. Ouija Board History and Use  The flight deck of an aircraft carrier is a complex scene, riddled with incoming aircraft,                              personnel moving around to take care of a variety of tasks and the ever present risk of hazards                                    and calamity. Flight Deck Control (FDC) is where the deck scene is coordinated and during                              flight operations it's one of the busiest places on the ship. The deck handlers in FDC send                                  instructions to the aircraft directors on the flight deck who manage all aircraft movement,                            placement and maintenance for the  deck regions they are responsible for.   FDC is filled with computer screens and video displays of all that is occurring outside on                                deck, but it is also home to one of the most crucial pieces of equipment in the Navy, the Ouija                                        board (Figure 1). The Ouija board is a waist­high replica of the flight deck at 1/16 scale that has                                      all the markings of the flight deck, as well as its full compliment of aircraft — all in cutout                                      models, and all tagged with items like thumbtacks and bolts to designate their status. The board                                offers an immediate glimpse of the deck status and allows the deck handlers in charge the ability                                  to manipulate the model deck objects and make planning decisions, should the need arise. The                              board has been in use since World War II and has provided a platform of collaboration for deck                                    handlers in terms of strategy planning for various scenarios on deck.  It is widely understood that the first round of damage to a ship will likely take out the                                    electronics; so to ensure the ship remains functional in battle, everything possible has a                            14 
  • 15. mechanical backup. Even though the traditional board has an advantage of being immune to                            electronic failures, there is potential for digital Ouija board technology to enhance the                          deck­operation­planning functionality and experience.    Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images.  1.2.2.  Naval Push for Digital Information on Decks  Even though the Ouija board has been used to track aircraft movement on aircraft carriers                              for over seventy years, the Navy is working on a computerized replacement due to limitations of                                the current model. As one of the simplest systems aboard Navy ships, the Ouija boards can only                                  be updated manually, i.e. when the deck handlers move models of aircraft and other assets                              around the model deck to match the movements of the real­life counterparts. The board does not                                offer any task automation, information processing or validation to help with strategy planning for                            various deck scenarios.     15 
  • 16.   Figure 2: The ADMACS Ouija board. Source: Google Images.  The new Ouija board replacement (Figure 2) is part of the Aviation Data Management                            and Control System (ADMACS) [2], a set of electronic upgrades for carriers designed to make                              use of the latest technologies. This system requires the deck handler to track flight deck activity                                via computer, working with a monitor that will be fed data directly from the flight deck. In                                  addition, the deck handler can move aircraft around on the simulated deck view using mouse and                                keyboard.   1.2.3. A Multimodal Ouija Board  The ADMACS Ouija board fixes the problem of updating the deck status in real­time                            without any manual work. It also allows the deck handlers to move aircraft on the simulated deck                                  view using mouse and keyboard as noted. However, most deck handlers are apparently skeptical                            of replacing the existing system and they think that things that are not broken should not be fixed                                    [6]. Considering these facts, imagine a new Ouija board with a large digital tabletop display that                                could show the status of the deck and had an understanding of certain deck actions for scenario                                  planning. To preserve the conventional way of interacting with the old­school Ouija board where                            16 
  • 17. deck handlers move aircraft by hand, the system would take advantage of multiple modes of                              interaction. Utilizing hand­tracking and speech recognition techniques, the system could let deck                        handlers point at objects on deck and speak their commands. In return, the system could respond                                with its own synthesized speech and update the graphics to illustrate the consequences of the                              commands given by the deck handlers. This would create a two­way communication between the                            system and the deck handlers.    1.3. System Demonstration  To demonstrate how the multimodal Ouija Board discussed in Section 1.2.3 works in                          practice and preview DeckAssistant in action, we take a look at an example scenario from [1]                                where a deck handler is trying to prepare an aircraft for launch on a catapult. The deck handler                                    needs to move the aircraft­to­be­launched to the catapult while moving other aircraft that are                            blocking the way to other locations on deck.  The system has a large tabletop display showing a digital, realistic rendering of an                            aircraft carrier deck with a complete set of aircraft (Figure 3).     Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1].  17 
  • 18. The deck handler stands in front of the table and issues commands using both hand                              gestures and speech (Figure 4). DeckAssistant uses either the Leap Motion Sensor (mounted on                            the edge of the display) or the Microsoft Kinect (mounted above the display) for hand­tracking.                              The deck handler wears a wireless Bluetooth headset that supports a two­way conversation with                            the system through speech.    Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1].   Figure 5 shows the initial aircraft arrangement of the deck. There are eleven F­18s (grey                              strike fighter jets) and two C­2s (white cargo aircraft) placed on the deck. There are four                                catapults at the front of the deck, and two of them are open. The deck handler will now try to                                        launch one of the C­2s on one of the open catapults, and that requires moving a C­2 from the                                      elevator, which is at the rear of the deck, to an open catapult, which is at the front of the deck.  After viewing the initial arrangement of the deck, the deck handler points at the aircraft to                                be moved, the lower C­2, and speaks the following command: “Move this C­2 to launch on                                Catapult 2”. The display shows where the deck handler is pointing at with an orange dot, and the                                    selected aircraft is highlighted in green (Figure 6).  18 
  • 19.   Figure 5: The initial arrangement of the deck [1].     Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].  Now, DeckAssistant does its analysis to figure out whether the command given by the                            deck handler can be accomplished without any extra action. In this case, there is an F­18                                blocking the path the C­2 needs to take to go to the catapult (Figure 7).      Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is blocked [1].   19 
  • 20. DeckAssistant knows that the F­18 has to be moved out of the way. It uses graphics and                                  synthesized speech to let the deck handler know that additional actions are needed to be taken                                and ask for the handler’s permission in the form of a yes­no question (Figure 8).     Figure 8: DeckAssistant displays an alternate location for the F­18 that is blocking the path [1].   The aircraft are moved in the simulation if the deck handler agrees to the actions                              proposed by the system. If not, the system reverts back to the state before the command. If the                                    deck handler does not like the action proposed by the system, they can cancel the command and                                  move aircraft around based on their own strategies. The goal of DeckAssistant here is to take                                care of small details while the deck handler focuses the more important deck operations without                              wasting time.   1.4. Thesis Outline  In the next section, we talk about what type of actions are available in DeckAssistant and                                how they are taken, what the system knows about the deck environment, and how the                              multimodal interaction works. Section 3 discusses the hardware and software used as well as                            introducing the software architecture behind DeckAssistant. Sections 4 and 5 look at                        implementation details discussing hand­tracking, speech synthesis and recognition. Section 6                    talks about related work. Section 7 discusses future work and concludes.        20 
  • 21. 2. DeckAssistant Functionality  This section gives an overview of actions available in DeckAssistant, discusses what                        DeckAssistant knows about the deck environment and the objects, and explains how the                          multimodal interaction happens.   2.1. Actions in DeckAssistant  The initial version of DeckAssistant focuses only on simple deck actions for aircraft                          movement and placement. These actions that allow deck handlers to perform tasks such as                            moving an aircraft from one location to another or preparing an aircraft for launch on a catapult.                                  These deck actions comprise the logic to perform a command given by the deck handler (Figure                                9). As the example in Section 1.3 suggests, these actions are built to be flexible and interactive.                                  This means that the deck handler is always consulted for their input during an action, they can                                  make alterations with additional commands, or they can suggest alternate actions if needed. The                            system takes care of the details, saving the deck handler’s time and allowing them to concentrate                                on more important tasks.   These are four actions available within DeckAssistant, as noted in [1]:   ● Moving aircraft from start to destination.  ● Finding an alternate location for aircraft to move if the intended destination is full.  ● Clearing a path for aircraft to move from start to end location.  ● Moving aircraft to launch on catapults.  21 
  • 22.   Figure 9: The logic for moving aircraft [1].  2.2. Deck Environment  DeckAssistant has an understanding of the deck environment, which includes various                      types of aircraft, regions on deck and paths between regions (See Chapter 4 of [1] for the                                  implementation details of the deck environment and objects).   2.2.1. Deck And Space Understanding  DeckAssistant’s user interface represents a scale model of a real deck just like a                            traditional Ouija Board. The system displays the status of aircraft on this user interface and use                                the same naming scheme that the deck handlers use for particular regions of the deck (Figure                                10). The deck handlers can thus refer to those regions by their names when using the system.                                  Each of these regions contain a set of parking spots in which the aircraft can reside. These                                  parking spots help the system determine the arrangement of parked aircraft and figure out the                              22 
  • 23. occupancy in a region. This means that the system knows if a region has enough room to move                                    aircraft to or if the path from one region to another is clear.    Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images.   2.2.2. Aircraft and Destination Selection  Each aircraft on deck is a unique object that has a tail number (displayed on each                                aircraft), type, position, status and other information that is useful for the system’s simulation.                            Currently, we support two different types of aircraft within DeckAssistant: F­18s and C­2s.   Selection of aircraft can be done two ways. The deck handler can either point at the                                aircraft (single or multiple) as shown in the example in Section 1.3, or, they can refer to the                                    aircraft by their tail numbers, for instance, “Aircraft Number­8”.   Destination selection is similar. Since destinations are regions on the deck, they can be                            referred to by their names or they can be pointed at.  2.2.3. Path Calculation and Rerouting  During path planning, the system draws straight lines between regions and uses the                          wingspan length as the width of the path to make sure that there are no aircraft blocking the way                                      and that the aircraft to move can fit into its path.   23 
  • 24. If a path is clear but the destination does not have enough open parking spots, the system                                  suggests alternate destinations and routes, checking the nearest neighboring regions for open                        spots.  2.3. Multimodal Interaction  The goal of the multimodal interaction created by DeckAssistant’s user interface is to                          create a communication between the deck handler and the system. The input in this interaction is                                a combination of hand gestures and speech performed by the deck handler. The output is the                                system’s response with synthesized speech and graphical updates.   2.3.1. Input  DeckAssistant uses either the Leap Motion Sensor or the Microsoft Kinect for tracking                          hands. Hand­tracking allows the system to recognize certain gestures using the position of the                            hands and fingertips. Currently, the system can only interpret pointing gestures where the deck                            handler points at aircraft or regions on the deck.   Commands are spoken into the microphone of the ​wireless Bluetooth headset that the                          deck handler wears, allowing the deck handler to issue a command using speech alone. In this                                case, the deck handler has to provide the tail number of the aircraft to be moved as well as the                                        destination name. An example could be: “Move ​Aircraft Number­8 to the ​Fantail​”.                        Alternatively, the deck handler can combine speech with one or more pointing gestures. In this                              case, for example, the deck handler can point at an aircraft to be moved and say “Move ​this                                    aircraft”; and then he can point at the destination and say “​over there​”.   2.3.2. Output  The system is very responsive to any input. As soon as the deck handler does a pointing   24 
  • 25. gesture, an orange dot appears on the screen, indicating where the deck handler is                            pointing at (Figure 11 (a)). If the deck handler is pointing at an aircraft, the system highlights                                  that aircraft with a green color, indicating a potential for selection (Figure 11 (b)). Eventually, if                                the deck handler takes an action to move aircraft on deck, the selected aircraft are highlighted in                                  orange. As mentioned earlier, the deck handler can select multiple aircraft (Figure 12).                                                           (a)   (b)  Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered over is  highlighted green [1].                                                      (a)   (b)  Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1].  The system’s responses to the deck handler’s input depend on the type of action and the                                aircraft arrangement on deck. If a certain action can be processed without additional actions, the                              25 
  • 26. system completes it and confirms it by saying “Okay, done”. If the action cannot be completed                                for any reason, the system explains why using its synthesized speech and graphical updates, and                              asks for the deck handler’s permission to take an alternate action. In the case of deck handler                                  approval, the system updates the arrangement on deck. The deck handler declines the suggested                            alternate action, the system reverts back to its previous state before the deck handler issued their                                command.  Section 1.3 gave us an example of this scenario where the system warned the user of the                                  aircraft that was blocking the path to a catapult and it recommended an alternate spot to move the                                    aircraft blocking the way. When the deck handler approved, then it could move the aircraft to                                launch on the catapult.  Let’s take a look at another scenario. Figure 13 shows an example of a situation where a                                  C­2 cannot be moved to the fantail since there are no open parking spots there. The system                                  circles all the blocking aircraft in red, and suggests an alternate region on deck to move the C­2.                                    In that case, the new region is highlighted in blue and a clear path to it is drawn (Figure 14). If                                          the deck handler accepts this suggested region, the system moves the C­2 there. If not, it reverts                                  back to its original state and waits for new commands.     Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].  26 
  • 28. 3. System Implementation  In this section, we introduce DeckAssistant’s hardware setup, the software libraries used                        and the software architecture design.  3.1. Hardware    Figure 15: The hardware used in DeckAssistant.  As it can be seen in Figure 15, DeckAssistant’s hardware setup consists of:  ● Four downward­facing Dell 5100MP projectors mounted over the tabletop. These                    projectors create a 42 by 32 inch seamless display with a 2800 x 2100 pixel resolution.  28 
  • 29. ● A white surface digitizer. The display is projected onto this surface.  ● A Leap Motion Sensor or a Microsoft Kinect (V1) for tracking hands over the table                              surface. The system can use either sensor.  ● A Logitech C920 Webcam for viewing the entire surface. This webcam is used to                            calibrate the seamless display using the ​ScalableDesktop Classic​ software.  ● A wireless Bluetooth headset for supporting a two­way conversation with the system.  This setup is powered by a Windows 7 desktop computer with an AMD Radeon HD 6870                                graphics card. It should be noted that the need for the surface digitizer, projectors and webcam                                would be eliminated if the system was configured to use a flat panel for the display.   3.2. Software   All of DeckAssistant’s code is written in Java 7 in the form of a stand­alone application.                                This application handles all the system functionality: graphics, speech recognition, speech                      synthesis, and gesture recognition.  3.2.1. Libraries  Four libraries are used to provide the desired functionality:  ● Processing: for graphics;  it is a fundamental part of our application framework.  ● AT&T Java Codekit: for speech recognition.  ● Microsoft Translator Java API: for speech synthesis.  ● Leap Motion Java SDK: provides the interface to the Leap Motion Controller sensor for                            hand­tracking.      29 
  • 30. 3.2.2. Architecture  DeckAssistant’s software architecture is structured around three stacks that handle the                      multimodal input and output. These three stacks run in parallel and are responsible for speech                              synthesis, speech recognition and hand­tracking. The Speech Synthesis Stack constructs                    sentences in response to a deck handler’s command and generates an audio file for that sentence                                that is played through the system’s speakers. The Speech Recognition Stack constantly listens for                            commands, does speech­to­text conversion and parses the text to figure out the command that                            was issued. The Hand­Tracking Stack interfaces either with the Leap Motion Sensor or the                            Microsoft Kinect, processes the data received and calculates the position of the user’s pointing                            finger over the display as well as detecting additional gestures. These three stacks each provide                              an API (Application Program Interface) so that the other components within DeckAssistant can                          communicate with them for a multimodal interaction.   Another crucial part of the architecture is the Action Manager component. The Action                          Manager’s job is to manipulate the deck by communicating with the three multimodal interaction                            stacks. Once a deck handler’s command is interpreted, it is passed into the Action Manager                              which updates the deck state and objects based on the command and responds by leveraging the                                Speech Synthesis Stack and graphics.   Finally, all of these stacks and components run on a Processing loop that executes every                              30 milliseconds. Each execution of this loop makes sure the multimodal input and output are                              processed. Figure 16 summarizes the software architecture. The ​DeckAssistant Software Guide                      (see Appendix for URL) details the implementation of each component within the system.   30 
  • 32. 4. Hand Tracking  In Chapter 5 of his thesis [1], Kojo Acquah discusses methods for tracking hands and                              recognizing pointing gestures using a Microsoft Kinect (V1). These initial hand­tracking                      methods of DeckAssistant can only recognize outstretched fingers on hands that are held mostly                            perpendicular to the focal plane of the camera. They do not work well with other hand poses,                                  leaving no way to recognize other gestures. Authors of [8] provide a detailed analysis of the                                accuracy and resolution of the Kinect sensor’s depth data. Their experimental results show that                            the random error in depth measurement increases with increasing distance to the sensor, ranging                            from a few millimeters to approximately 4 centimeters at the maximum range of the sensor. The                                quality of the data is also found to be affected by the low resolution of the depth measurements                                    that depend on the frame rate (30fps [7]). The authors thus suggest that the obtained accuracy, in                                  general, is sufficient for detecting arm and body gestures, but is not sufficient for precise finger                                tracking and hand gestures. Experimenting with DeckAssistant’s initial version to take certain                        actions, we note a laggy and low­accuracy hand­tracking performance by the Kinect sensor. In                            addition, the Kinect always has to be calibrated before DeckAssistant can be used. This is a                                time­consuming process. Finally, the current setup has a usability problem; when deck handlers                          stand in front of the tabletop and point at the aircraft on the display, their hands block the                                    projectors’ lights causing shadows in the display.   Authors of [9] present a study of the accuracy and robustness of the Leap Motion Sensor.                                They use an industrial robot with a reference pen allowing suitable position accuracy for the                              experiment. Their results show high precision (an overall average accuracy of 0.7mm) in                          fingertip position detection. Even though they do not achieve the accuracy of 0.01mm, as stated                              32 
  • 33. by the manufacturer [3], they claim that the Leap Motion Sensor performs better than the                              Microsoft Kinect in the same experiment.   This section describes our use of the Leap Motion Sensor, to track hands and recognize                              gestures, allowing for a high­degree of subjective robustness.   4.1. The Leap Motion Sensor  The Leap Motion Sensor is a 3” long USB device that tracks hand and finger motions. It                                  works by projecting infrared light upward from the device and detecting reflections using                          monochromatic infrared cameras. Its field of view extends from 25mm to 600mm above the                            device with a 150° spread and a high frame rate (>200fps) [3]. In addition, more information                                about the hands is provided by the Application Programming Interface (API) of the Leap Motion                              Sensor than the Microsoft Kinect’s (V1).    Figure 17: The Leap Motion Sensor mounted on the edge of the tabletop display.  The Leap Motion Sensor is mounted on the edge of the tabletop display, as shown above                                in Figure 17. In this position, hands no longer block the projector’s lights, thereby eliminating                              33 
  • 34. the shadows in the display. The sensor also removes the need for calibration before use, enabling                                DeckAssistant to run without any extra work. Finally, thanks to its accuracy in finger­tracking,                            the sensor creates the opportunity for more hand gestures to express detail in deck actions (see                                Section 4.1.2).   4.1.1. Pointing Detection  The Leap Motion API provides us with motion tracking data as a series of frames. Each                                frame contains measured positions and other information about detected entities. Since we are                          interested in detecting pointing, we look at the fingers. The ​Pointableclass in the API reports                                the physical characteristics of detected extended fingers such as tip position, direction, etc. From                            these extended fingers, we choose the pointing finger as the one that is farthest toward the front                                  in the standard Leap Motion frame of reference. Once we have the pointing finger, we retrieve its                                  tip position by calling the ​Pointable class’ ​stabilizedTipPosition() method. This                    method applies smoothing and stabilization on the tip position, removing the the flickering                          caused by sudden hand movements and yielding a more accurate pointing detection that                          improves the interaction with our 2D visual content. The stabilized tip position lags behind the                              original tip position by a variable amount (not specified by the manufacturer) [3] depending on                              the speed of movement.  Finally, we map the tip position from the Leap Motion coordinate system to our system’s                              2D display. For this, we use the API class ​InteractionBox​. This class represents a                            cuboid­shaped region contained in the Leap Motion’s field of view (Figure 18). The                          InteractionBoxprovides normalized coordinates for detected entities within itself. Calling                    the ​normalizePoint()method of this class returns the normalized 3D coordinates for the tip                            34 
  • 35. position within the range [0...1]. Multiplying the X and Y components of these normalized                            coordinates by the our system’s screen dimensions, we complete the mapping process and obtain                            the 2D coordinates in our display. Algorithm 1 summarizes the pointing detection process.      Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer Portal.    Algorithm 1: Summary of the pointing detection process in pseudocode.   As discussed in Section 2.3.2, the mapped tip position is displayed on the screen as an                                orange dot.   4.1.2. Gesture Detection  We implemented a new gesture for multiple aircraft selection, using a combination of                          pointing and pinching. The deck handler can point with their index finger while pinching with                              35 
  • 36. their thumb and middle finger to select multiple aircraft. We detect this gesture using the Leap                                Motion API’s ​pinchStrength()method. If the deck handler is pinching, the value returned                          by this method is 1, and 0 otherwise. However, since this value can be affected by movements of                                    the deck handler’s hand due to the device’s sensitivity, we apply a moving average method to                                make sure that the majority of the values we receive from the method indicate pinching. In                                addition, we recognize this gesture only if the user is pinching with the thumb and the middle                                  finger. We do this by iterating through the list of fingers in a frame and checking the distance                                    between their tip positions and the thumb’s tip position. The middle finger’s tip position in this                                case is supposed to have the smallest distance to the thumb’s tip position. The reason for this                                  check is that we do not want to recognize other hand poses as a pinch gesture. For example, if                                      the deck handler is pointing with their index finger and the other fingers are not extended, the                                  system might think that the user is pinching. However, that is not the case and the check we run                                      along with the moving average applied pinch strength value prevents the recognition of such                            cases. Figure 19 shows an example of multiple aircraft selection using the pinch gesture.     Figure 19: Demonstration of multiple aircraft selection with the pinch gesture.  36 
  • 37. 5. Speech Synthesis and Recognition  This section details the improvements in Speech Synthesis and Speech Recognition for                        DeckAssistant.  5.1. Speech Synthesis  The initial version of DeckAssistant, as discussed in [1, Section 6.1], used the FreeTTS                            package for speech synthesis. Even though FreeTTS provides an easy­to­use API and is                          compatible with many operating systems, it lacks pronunciation quality and clarity in speech. To                            solve this problem, we implemented a speech synthesizer interface that acts as a front to any                                speech synthesis library that we plug in. One library that works successfully with our system is                                the Microsoft Translator API, a cloud­based automatic, machine translation service that supports                        multiple languages. Since our application uses the English language, we do not use any of the                                translation features of the service. Instead, we use it to generate a speech file from the text we                                    feed in.   As explained in Section 3.2.2, speech is synthesized in response to a deck handler’s                            commands. Any module in the software can call the Speech Synthesis Engine of the Speech                              Synthesis Stack to generate speech. Once called, the Speech Synthesis Engine feeds the text to                              be spoken into the Microsoft Translator API through the interface we created. The Microsoft                            Translator API then makes a request to the Microsoft Translator API which returns a WAV file                                that we play through our system’s speakers. In the case of multiple speech synthesis requests, the                                system queues these requests and handles them in order. Using the Microsoft Translator API                            enables us to provide high­quality speech synthesis with clear voices. It should be noted that the                                37 
  • 38. future developers of DeckAssistant can incorporate any speech synthesis library into the system                          with ease.   5.2. Speech Recognition  The CMU Sphinx 4 library is used for recognizing speech in the initial version of                              DeckAssistant [1, Section 6.2]. Even though Sphinx provides an easy API to convert speech into                              text with acoustic models and a grammar (rules for specific phrase construction) of our choice,                              the speech recognition performance is poor in terms of recognition speed and accuracy. In the                              experiments we ran during development, we ended up repeating ourselves several times until the                            recognizer picked up what we were saying. In response, we introduced a speech recognizer                            interface that provides us with the flexibility to use any speech recognition library. Other                            modules in DeckAssistant can call this interface and use the recognized speech as needed.   5.2.1. Recording Sound  The user can talk to DeckAssistant at any time, without the need for extra actions such as                                  push­to­talk or gestures. For this reason, the system should constantly be recording using the                            microphone, understanding when the user is done issuing a command, and generating a WAV                            file of the spoken command. Sphinx’s Live Speech Recognizer took care of this by default.                              However, since the speech recognizer library we decided to use (discussed in the next section)                              did not provide any live speech recognition, we had to implement our own sound recorder that                                generates WAV files with the spoken commands. For this task, we use SoX (Sound Exchange), a                                cross­platform command line utility that can record audio files and process them. The SoX                            command constantly runs in the background to record any sound. It stops recording once no                              sound is detected after the user has started speaking. It then trims out certain noise bursts and                                  38 
  • 39. writes the recorded speech to a WAV file which is sent back to DeckAssistant. Once the speech                                  recognizer is done with the speech­to­text operation, this background process is run again to                            record new commands. For more details about SoX, please refer to the SoX Documentation [4].  5.2.2. Choosing a Speech Recognition Library  To pick the most suitable speech recognition library for our needs, we experimented with                            four popular APIs:  ● Google Speech: It did not provide an official API. We had to send an HTTP request to                                  their service with a recorded WAV file to get the speech­to­text response, and were                            limited to 50 requests per day. Even though the responses for random sentences that we                              used for testing were accurate, it did not work very well for our own grammar since the                                  library does not provide any grammar configuration. A simple example could be the                          sentence “Move this C­2”. The recognizer thought that we were saying “Move this see                            too”. Since we had a lot of similar issues with other commands, we decided not to use                                  this library.   ● IBM Watson Speech API: Brand new, easy­to­use API. It transcribed the incoming audio                          and sent it back to our system with minimal delay, and speech recognition seemed to                              improve as it heard more. However, like Google Speech, it did not provide any grammar                              configuration which caused inaccuracy in recognizing certain commands in our system.                      Therefore, we did not use this library.  ● Alexa Voice Service: Amazon recently made this service available. Even though the                        speech recognition works well for the purposes it was designed for, it unfortunately                          cannot be used as a pure speech­to­text service. Instead of returning the text spoken, the                              39 
  • 40. service returns an audio file with a response which is not useful for us. After hacking                                with the service, we managed to extract the text that was transcribed from the audio file                                we sent in. However, it turns out that the Alexa Voice Service can only be used when the                                    user says the words “Alexa, tell DeckAssistant to…” before issuing a command. That is                            not very usable for our purposes, so we choose not to work with this service.   ● AT&T Speech: This system allowed us to configure a vocabulary and a grammar that                            made the speech recognition of our specific commands very accurate. Like the IBM                          Watson Speech API, the transcription of the audio file we sent in was returned with                              minimal delay. Therefore, we ended up using this library for our speech recognizer. The                            one downside of this library was that we had to pay a fee to receive Premium Access for                                   1 the Speech API.   As explained in Section 5.2.1, recognition is performed after each spoken command                        followed by a brief period of silence. Once the AT&T Speech library recognizes a phrase in our                                  grammar, we pass the transcribed text into our parser.  5.2.3. Parsing Speech Commands  The parser extracts metadata that represents the type of the command being issued as well                              as any other relevant information. Each transcribed text that is sent to the parser is called a base                                    command. Out of all the base commands, only the Decision Command (Table 1) represents a                              meaningful action by itself. The parser interprets the rest of the commands in two stages, which                                allows for gestural input alongside speech. We call these combined commands. Let’s look at an                              example where we have the command “Move this aircraft, over there”. When issuing this                            1  AT&T Developer Premium Access costs $99.   40 
  • 41. command, the deck handler points at the aircraft to be moved and says “Move this aircraft...”,                                followed by “...over there” while pointing at the destination. In the meantime, the parser sends                              the metadata extracted from the text to the Action Manager, which holds the information until                              two base commands can be combined into a single command for an action to be taken. In                                  addition, the Action Manager provides visual and auditory feedback to the deck handler during                            the process. A full breakdown of speech commands are found in [1] and listed here:  Base Commands  Name  Function  Example(s)  Move Command  Selects aircraft to be moved.  “Move this C­2…”  Location Command  Selects destination of move.  “…to the fantail.”  Launch Command  Selects catapult(s) to launch  aircraft on.  “…to launch on Catapult 2.”  Decision Command  Responds to a question from  DeckAssistant  “Yes”, “No”, “Okay”.    Combined Commands  Name  Function  Combination  Move to Location Command  Moves aircraft to a specified  destination.  Move Command + Location  Command  Move Aircraft to Launch  Command  Moves aircraft to launch on one  or more catapults.  Move Command + Launch  Command  Table 1: Set of commands that are recognized by DeckAssistant.   5.2.4. Speech Recognition Stack in Action  In Figure 20, we outline how the Speech Recognition Stack works with the Action                            Manager to create deck actions. As already discussed in Section 5.2.1, the SoX process that we                                run is constantly recording and waiting for commands. Figure 20 uses a command that moves an                                41 
  • 42. aircraft to a deck region as an example. When the deck handler issues the first command, the                                  SoX process sends the speech recognizer a WAV file to transcribe. The transcribed text is then                                sent to the speech parser which extracts the metadata. Once the speech recognizer is done                              transcribing, it restarts the recording of sound through the SoX Command to listen for future                              commands. Step 1 on Figure 20 shows that the metadata extracted represents a Move Command                              for an aircraft that is being pointed at. The Action Manager receives this information at Step 2,                                  understands that it is a base command and it waits for another command to combine them into a                                    single command that represents a deck action. In the meantime, the Action Manager consults the                              Selection Engine at Step 3 to get the information for the aircraft that is being pointed at. This                                    allows the Action Manager highlight the aircraft that is selected. Meanwhile, the deck handler                            speaks the rest of the command, which is sent to the parser. Step 4 shows the metadata that is                                      assigned to the base command spoken. In this case, we have a Location Command and the name                                  of the deck region which is the destination. In Step 5, the Action Manager constructs the final                                  command with the second base command, and it fetches the destination information through the                            Deck Object. Finally, a Deck Action is created (Step 7) with the information gathered from the                                Speech Recognition Stack and other modules.   Implementation of Deck Actions is described in [1, Section 7].     42 
  • 44. 6. Related Work  This section presents the work done previously that inspired the DeckAssistant project.   6.1. Navy ADMACS  As mentioned in Section 1.2.2, the Navy is moving towards a more technologically                          developed and connected system called ​ADMACS that is a real­time data management system                          connecting the carrier's air department, ship divisions and sailors who manage aircraft launch and                            recovery operations.   6.2. Deck Heuristic Action Planner  Ryan et al. have developed ‘a decision support system for flight deck operations that                            utilizes a conventional integer linear program­based planning algorithm’ [5]. In this system, a                          human operator inputs the end goals as well as constraints, and the algorithm returns a proposed                                schedule of operations for the operator’s approval. Even though their experiments showed that                          human heuristics perform better than the plans produced by the algorithm, human decisions are                            usually conservative and the system can offer alternate plans. This is an early attempt to aid                                planning on aircraft carriers.                        44 
  • 45. 7. Conclusion  In this thesis, we introduced improvements to DeckAssistant, a system that provides a                          traditional Ouija board interface by displaying a digital rendering of an aircraft carrier deck that                              assists deck handlers in planning deck operations. DeckAssistant has a large digital tabletop                          display that shows the status of the deck and has an understanding of certain deck actions for                                  scenario planning. To preserve the conventional way of interacting with the old­school Ouija                          board where deck handlers move aircraft by hand, the system takes advantage of multiple modes                              of interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the                              system. The system responds with its own speech and updates the display to show the                              consequences of the actions taken by the handlers. The system can also be used to simulate                                certain scenarios during the planning process. The multimodal interaction described here creates                        a communication of sorts between deck handlers and the system.  Our work includes three improvements to the initial version of DeckAssistant built by                          Kojo Acquah [1]. First is the introduction of the Leap Motion Sensor for pointing detection and                                gesture recognition. We presented our subjective opinions on why the Leap Motion device                          performs better than the Microsoft Kinect and we explain how we achieve pointing detection and                              gesture recognition using the device. The second improvement is better speech synthesis from                          our introduction of a new speech synthesis library that provides high­quality pronunciation and                          clarity in speech. The third improvement is better speech recognition. We discuss the use cases                              of several speech recognition libraries and figure out which one is the best for our purposes. We                                  explain how to integrate this new library into the current system with our own methods of                                recording voice.   45 
  • 46. 7.1. Future Work  While the current version of DeckAssistant focuses only on aircraft movement based on                          deck handler actions, future versions may be able to implement algorithms where the system can                              simulate the most optimal ordering of operations for an end goal, while accounting for deck and                                aircraft status such as maintenance needs.   Currently, DeckAssistant’s display that is created by the four downward­facing projectors                      mounted over the tabletop (discussed in Section 3.1) has a high pixel resolution. However, it is                                not as seamless as it should be. The ScalableDesktop software is being used to accomplish an                                automatic edge­blending of the four displays, however the regions where the projectors overlap                          are still visible. Moreover, the ScalableDesktop software has to be run for calibration every time                              a user tries to start DeckViewer, and the brightness of the display is low. Instead of the projectors                                    and the tabletop surface, a high­resolution, touchscreen LED TV might be mounted flat on a                              table. This would provide a seamless display free of projector overlaps and remove the need for                                time­consuming calibration. In addition, with the touchscreen feature, we can introduce drawing                        gestures where the deck handler can draw out the aircraft movement as well as take notes on the                                    screen.                         46 
  • 47. 8. References  [1] Kojo Acquah. ​Towards a Multimodal Ouija Board for Aircraft Carrier Deck Operations.  June 2015.    [2]  US Navy Air Systems Command. ​Navy Training System Plan for Aviation Data  Management and Control System​. March 2002.    [3] The Leap Motion Sensor. ​Leap Motion for Mac and PC​. November 2015.     [4]  SoX Documentation. ​http://sox.sourceforge.net/Docs/Documentation​. February 2013.    [5]  Ryan et al. ​Comparing the Performance of Expert User Heuristics and an Integer Linear  Program in Aircraft Carrier Deck Operations. ​2013.     [6] Ziezulewicz, Geoff. "Old­school 'Ouija Board' Being Phased out on Navy Carriers." ​Stars  and Stripes​. Stars and Stripes, 10 Aug. 2011. Web. 03 Mar. 2016.    [7] Microsoft. ​Kinect for Windows Sensor Components and Specifications​. Web. 07 Mar. 2016.    [8] Khoshelham, K.; Elberink, S.O. Accuracy and resolution of kinect depth data for indoor                            mapping applications. Sensors 2012, 12, 1437–1454.    [9] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. Analysis of the accuracy and robustness                              of the leap motion controller. Sensors 2013, 13, 6380–6393.                                          47 
  • 48. 9. Appendix  9.1. Code and Documentation  The source code of DeckAssistant, documentation on how to get up and running with the                              system, and the ​DeckAssistant Software Guide ​is available on GitHub:                    https://github.mit.edu/MUG­CSAIL/DeckViewer​.    48