DMDI

i
Dynamic Multimodal Diagnostic Interface
A Report Submitted to the Graduate Faculty of
Auburn University
In Partial Fulfillment of the Requirements for the Degree of
Master of Science
In
Software Engineering
Auburn, Alabama
July 13, 2005

ii
Billy Thomas Baker, Jr.
Certificate of Approval
Juan E. Gilbert, Chair
Assistant Professor
Computer Science and Software
Engineering
Dean Hendrix
Associate Professor
Engineering
Cheryl Seals
Assistant Professor
Engineering

iii
Billy Thomas Baker, Jr.
Abstract
Multimodal interfaces are becoming increasingly important as a means of replacing or extending
the human presence. This project demonstrates a system for performing diagnostic interviews
using a Web based multimodal interface. Specifically, this report outlines the design and
implementation of a system that dynamically generates multimodal Web pages in conjunction
with a diagnostic dialog manager. This system enables a layperson or physician to participate in
a diagnostic conversation with the application where the goal is to arrive at a diagnosis or
decision. From the developer’s standpoint; this project suggests a simple and effective approach
for executing ad-hoc context dependent conversations in a multimodal interface. Design
considerations, implementation details, a practical example and future work are presented.

iv
Acknowledgement
I’ll begin by thanking Dr. Juan Gilbert. He has been a constant source of inspiration and his
ability to offer encouragement at the right time has, in large part, made this project possible. I
especially thank Dr. Gilbert for introducing me to the fie ld of multimodal interfaces. Thanks go
to Dr. Dean Hendrix and Dr. Cheryl Seals for supporting me as members of my committee.
Thank you to my fellow students in the HCCL lab for taking the time to read and critique this
report. The managers and executives at Southern Nuclear must be mentioned, their support over
the last few years has been invaluable and their acceptance of my eccentricities is appreciated.
Thanks to my son Dannon, he continues to vigorously engage me in insightful discussions on
human cognition and machine learning. I thank my brother, Dr. Jim, who provided much of the
rheumatologic diagnostic data and explained many of the alien terms to this layman. My wife
Ann has been an eternal source of encouragement, support, and tolerance. Without the support of
these good people I would not have gotten this far.

v
Contents
1. INTRODUCTION………………………………………………………………… 1
1.1. Approach………………………………………………..……………….. 2
2. BACKGROUND………………………………………………………………….. 3
2.1. Learning and Conversation………...…………………………………… 3
2.2. Diagnostic Process………………………………...……………………. 4
2.2.1. Medical Assessments………………………………………….. 6
2.2.2. Demographic Considerations……………….………………… 7
2.3. Multimodal Interaction Framework…………………………………….. 7
2.3.1 Input……………………………………………………………. 8
2.3.2 Output………………………………………………………….. 9
2.3.3 Interaction Management…………………………….…………. 10
2.3.4 Agent Functions……………………………………….………. 11
2.3.5 Session Component…………………………………….………. 11
2.3.6 System & Environment………………………………….……… 11
2.4 SALT……………………………………………………………………… 11
3. PROBLEM / SOLUTION………………………………………………………… 14
3.1 Problem…………………………………………………………………... 14
3.1.1 Access to Physicians…………………………………………… 15
3.1.2 Inconsistent Diagnostic Process……………………………….. 15

vi
3.1.3 Knowledge – Experience Disconnect……………….………….. 16
3.2 SOLUTION………………………………………………………………. 16
3.2.1 Physician Task Outsourcing…………………………………. 17
3.2.2 Composite Experience…………………………………………. 17
4. DEVELOPMENT………………………………………………………………… 19
4.1 Requirements…………………………….……………………………….. 19
4.2 Functional Overview………………….………………………………….. 19
4.2.1 Dynamic Page Generation…….……………………………….. 21
4.2.1.1 Embedded SALT……………………………………… 22
4.2.1.2 HTML Form………………………………………….. 22
4.2.1.3 Dynamic Grammar Generation……………………… 23
4.2.1.4 Page Control…………………………………………. 23
4.2.2 Diagnostic Dialog Manager…………………………………… 23
4.2.3 Diagnostic Decision Process…………………………………... 25
4.2.4 Data Structure………………………………………………….. 28
5. IMPLEMENTATION……………………………………………………………. 31
6. PRACTICAL APPLICATIONS………………………………………………….. 37
6.1 Clinical trainer…………………………………………………………… 37
6.2 Autonomous planning agent……………………………………………… 37
7. FUTURE WORK…………………………………………………………………. 39
8. REFERERNCES…………………………………………………………………. 41

vii
List of figures
1. Figure 1 - Connectionist Concept-Attribute Model Naïve Bayes Network........... 3
2. Figure 2 - Diagnostic Process……………………………………...…………… 6
3. Figure 3 - Multimedia Interaction Framework Overview…………………...….. 8
4. Figure 4 - Multimedia Interaction Framework - Input…………………………. 9
5. Figure 5 - Multimedia Interaction Framework - Output…………………........... 10
6. Figure 6 - Component Diagram……………………………………...…………. 21
7. Figure 7- Sequence Diagram………………………………...…….…………… 24
8. Figure 8 - Decision Process…………………………………………..………… 26
9. Figure 9 - Basic Application Diagram……………………………..…………… 32
10. Figure 10 - Example1: Tennis Elbow…………………………………………… 33
11. Figure 11 - Example2: Diagnosis Transition…………...………………………. 34
12 Figure 12 - Example3: Diagnosis………………………………………………. 35
13 Figure 13 - Future Work………………………………………………………... 40

viii
List of Tables
1. Table 1- Concept – Attribute Pairs……………………………………………. 29
2. Table 2 - Attribute Arguments…………………………………………………. 30
3. Table 3 - DOM Window.location object………………………………………. 36

1
1. INTRODUCTION
Humans learn (acquire knowledge) by experiencing their environment in terms of
primitive attributes related to their senses. Since most humans share the same array
of senses, they share a common set of primitive attributes that can be conveyed as
concepts to other humans via conversation or other means of communication. Speech
is the primary means humans have used in the past to communicate their experience
to others. The advent of writing enhanced communication with an “almost” time
independent media. Written concepts could be clearly communicated to people
hundreds of years in the future or thousands of miles away. The printing press
extended the influence of writing by making the knowledge and the experience
captured by writing available to all who could read and obtain the media. A few
hundred years later, computers and the internet now are positioned to make the
printing press irrelevant and have made communication effectively independent of
time. Concepts and knowledge can be shared with people all over the planet within
seconds of having been created or preserved for use decades later. Today we are on
the threshold of another step change in communication; machines are beginning to act
as our agents. They can collect information specific to our domain of interest and
within the context of our life style and needs, may use that information to enhance our
knowledge and awareness of our environment. People will in effect “outsource”
much of the tedious and labor intensive aspects of data mining to agents who in-turn
will help people answer questions, make decisions, recognize patterns and in some
cases take action for us.

2
1.1. APPROACH
The goal of this project was to demonstrate the potential of combining aspects of
artificial intelligence with a multimodal interface to deliver a human proxy for
conducting diagnostic interviews: a diagnostic agent. Since interviews are not
scripted but are context dependent, the application has to be robust enough to select
and ask questions by considering the information gained earlier in the conversation.
Given the need to ask questions in an “ad-hoc” environment, the application also
must be able to manage the dialog progress by generating multimodal interfaces on-
the-fly. (11,12) The World Wide Web Consortium (W3C) has proposed a framework
for multimodal interface implementation and to the extent possible; the application
has been designed to conform to that framework.(3) Significant effort was made to
keep the system as simple as possible; dialog management is implemented from the
client versus the server. Page control and data flow between the interface and the
dialog manager is accomplished on the client via the DOM (Document Object
Model). The application also provides feedback to the user or co-user by presenting a
graphic representation of the dialog scope and progress. In order to further simplify
deployment, system reference data is deployed using simple static text files; no
relational database management system is required. The application makes use of
PHP (PHP: Hypertext Preprocessor) for server side processing and data retrieval.
The Web GUI is DHTML with embedded SALT (Speech Application Language
Tags).

3
2. BACKGROUND
2.1. LEARNING and CONVERSATION
This report will consider a concept as a collection of primitive attributes and other
supporting concepts. The number of supporting concepts and primitive attributes that
define a concept can be equated to a level of knowledge about that concept. (15,16)
The strength of the associated interconnections between a concept, its attributes and
supporting concepts, can be directly related to ones ability to recognize that concept;
interconnection strength is, in effect, your level of familiarity and your experience
with that concept.
Figure 1
Connectionist Concept-Attribute Model
Naïve Bayes Network
Machine learning can be defined as the process by which a machine acquires
knowledge or experience. In the connectionist model it is postulated that networks

4
learn by changing the strengths of their interconnections and/or establishing new
interconnections in response to feedback (experience). Figure 1 illustrates the
concept – attribute relationship for a Naïve Bayes Network. (10) A tuple (a collection
of all the facts related to one entity, often a row in a table) representing a network
relationship would include the concept, the attribute, the connection strength and
attribute value at a minimum.
2.2. DIAGNOSTIC PROCESS
Traditionally, theoretic diagnostic methods have been categorized as one or a
combination of three primary reasoning techniques; probabilistic, deterministic and
causal.(20) Probabilistic or statistical reasoning makes conclusions based on the
statistical correlation between observed and reference attributes.(21) This technique
lends itself to mathematical definition (Bayes Theorem ref.) where the diagnosis is
promptly computed as soon as the pertinent attributes are assessed. Deterministic
reasoning makes conclusions based on the outcome of a series of binary rules
organized into logical progressions called decision trees.(18) The order of the rules in
the tree is optimized to minimize the number of rules needed to reach an outcome.
(19) Causal reasoning makes conclusions based on a comparison between actual
conditions and a “causal model” representing normality. Potential cause mechanisms
are either validated or excluded after comparison against the model.
In actual practice, diagnosis is the process of identifying the cause of a problem or
situation by identifying the distinguishing attributes of the problem or situation and

5
then relating those attributes to the distinguishing attributes of potential causes of that
problem or situation. When most effective, diagnosis is a combination of the three
reasoning techniques described above, but it is accomplished automatically without
any formal alignment with, or consideration of, the previously mentioned techniques.
(13) Robin C Fraser (1987) in his Clinical Method: A General Practice Approach (14)
states: “In actual clinical practice, however, such an approach to clinical problem-
solving is rarely used by general practitioners and infrequently used by hospital
doctors because it lacks discrimination and has a poor yield in terms of the time and
effort expended…..In reality, most clinicians reach diagnosis by a process of
hypothetico-deductive reasoning, i.e. by educated guessing and testing”. Not
surprisingly, a closer look reveals that diagnosis resembles the process humans
unconsciously use to recognize all aspects of their environment; namely, a holistic
(pattern) matching process of concepts and attributes. Pattern matching is not about
absolute matches but more about establishing the best match for the smallest attribute
set. This is the crux of the experience – knowledge relationship.
Diagnosis can be relegated to four distinct phases: problem recognition, problem or
cause attribute correlation, attribute assessment and feedback. Given a problem or
situation, correlation to potential causes results in one or more cause attributes being
identified that can be assessed with respect to the related problem attribute in order to
prove or disprove the potential cause. On a higher level, diagnosis is the process of
evaluating the degree of attribute correlation between a problem and possible causes.
The correlation process normally involves several attributes that may vary from
deterministic attributes (e.g. always applicable for a condition) to probabilistic

6
attributes (e.g. sometimes applicable to a condition). (17) The level of objectivity or
subjectivity inherent in the attribute assessment phase further complicates diagnosis.
Figure 2
Diagnosis: A Model. From Clinical Method, Robin C Fraser 1987.
2.2.1. MEDICAL ASSESSMENTS
The medical assessment, sometimes referred to as an impression, is the process by
which a physician evaluates patient medical history, family history, social
environment-demographics and, if applicable, observes current symptoms. The
assessment is normally initiated as a result of the patient communicating a complaint.
The ultimate goal of the assessment is to reach a diagnosis of the complaint and if
warranted propose a course of treatment. The validity of the diagnosis is dependent
on both the completeness and accuracy of the patient’s medical and family histories

7
as well as the thoroughness of the physician’s examination and dialogue with the
patient. The degree of thoroughness exercised in an examination can be correlated to
the degree of relevant experience the physician has with the condition being
evaluated. As mentioned earlier, that experience is effectively the doctors’
knowledge of the potential causes of the condition being evaluated and the observable
attributes of those causes. Results of the assessment are typically documented on
paper or transcribed from voice recordings and later reviewed by the physician for
accuracy and completeness.
2.2.2. DEMOGRAPHIC CONSIDERATIONS
The impact demographic attributes have on the accuracy of a diagnosis can be
significant but recognition of the influence of demographic specific attributes can be
difficult for a physician without exposure to large amounts of diagnostic data where
the full spectrum of demographic variations are included. Access to data, other than
what is gained via personal experience, is normally limited to that offered in medical
journals or lectures.
2.3. MULTIMODAL INTERACTION FRAMEWORK
The World Wide Web Consortium (W3C) is proposing a framework for multimodal
interaction. Simplistically, the multimodal interaction framework is comprised of an
Interaction Manager that accepts input from the user via one or multiple modes of input, such
as speaking, typing, mouse or gestures. The Interaction Manager acts as liaison between the
user and agent functions, session component and system / environment. Output from the

8
agent functions is presented to the user via one or more modalities; most commonly speech
and graphics. (1, 2)
The approach used to implement the interaction manager varies with the application but by
far the most common Web application is the speech enabled HTML form.(4)
Figure 3
Multimedia Interaction Framework – Overview
2.3.1. INPUT
The input component can be broken down into three sub-components; recognition,
interpretation and integration. The recognition component captures and translates user input
into a form that is useful to the interpretation component. Speech is converted into text using
language and acoustic models along with a speech recognition grammar. Mouse movement
and clicks are converted to x-y positions and key presses are converted into text based
characters. Other modes of input such as handwriting, DTMF, biometrics and vision would
be translated in this component. The interpretation component further processes input from
the recognition component primarily in cases where more than one recognition component
input value have the same meaning or semantic intention. The integration component
integrates the output of interpretation components to yield a synchronized and composite

9
output that is routed to the interaction manager.(6) An example of integration would be
synchronizing mouse input and speech input to yield a single user intention.
Figure 4
Multimedia Interaction Framework – Input
2.3.2. OUTPUT
The output component can also be broken down into three subcomponents; the generation
component, the styling component and the rendering component. The generation component
uses output from the interaction manger to determine the modality of information presented
to the user. In the case of a multimodalWeb page, the generation component would provide
both the graphics and speech outputs. The styling component inserts layout information. In
the case of speech, the “layout” information might be voice timbre, inflection and volume; in
the case of graphics, layout is the familiar position, size, color, etc. The rendering
component processes the information provided by the styling components into formats that

10
the user can understand. Speech output is converted into a voice; graphics output is
converted into text, controls and other graphic representations.
Figure 5
Multimedia Interaction Framework – Output
2.3.3. INTERACTION MANAGEMENT
The interaction management component coordinates the flow of interaction and execution
between the input and output components. On receipt of input information from the input
components, the interaction management component updates application context and
information. The updated context and information is then routed to the output components.
Several tools may be used to implement the interaction manager. Those tools include HTML,
XHTML, Speech Application Language Tags (SALT), C, C++ and X+V (XHTML plus
Voice).

11
2.3.4. AGENT FUNCTIONS
Agent functions evaluate the interaction state provided by the interaction manager and
respond with program flow directives. Business and process logic are from agent functions to
the user by way of the interaction manager.
2.3.5. SESSION COMPONENT
The session component provides an interface for requesting and releasing session resources
for distributed applications where one or more device or user is involved. The session
component is also instrumental in managing applications that require persistent and in
managing resources in distributed environments.
2.3.6. SYSTEM & ENVIRONMENT
The system and environment component will facilitate dynamic adaptation to changes in
device capabilities, environmental conditions and user preferences. This component will
modify the actions of the interaction manager as the number of devices and user’s changes;
both distributed and stand alone implementations must be supported.
2.4. SALT
Speech Application Language Tags (SALT) is an XML specification for elements that can be
embedded into an application to provide input/output control of speech recognition and
speech synthesis. SALT was contributed to the W3C in 2002 by the SALT Forum; an
industry group supported by Microsoft, Intel, Cisco, Comverse, Philips and ScanSoft
(originally SpeechWorks). Unlike VXML, SALT contains no support structures, interaction
flow must be provided by the host language. Elements that enable user speech input are

12
called prompts; elements that provide speech output are called responses. A brief overview
of the four top level SALT tags follows:
• <listen> for speech input; a speech input object is instantiated in the XML
document when this tag encountered. The listen element also contains grammar,
binding and recording controls:
o <grammar> specifies or references the domain of words and phrases that
the system will recognize. The actual grammar can be implemented as
either an integral part of the page or it can be contained in a separate file and
referenced via a universal resource indicator.
o <bind> integrates speech with host application logic by binding spoken
input value into the page.
o <record> records sounds, speech, etc.
• <prompt> for speech output: a speech output object instantiated in the XML
document when this tag encountered. The listen element also contains the grammar,
binding controls described above.
• <dtmf> for touch-tone input
• <smex> for platform messaging to enable platform call-control and telephony
features. This element also contains the binding control to bind process messages.
All four top level elements contain the <param> element. This element is used to extend
SALT elements with new functions. SALT as a whole can be extended with new
functionality using XML.
SALT pages can be viewed as being composed of three primary sections, data,
presentation and script. The data section defines the information the user will provide to the

13
application in order to meet sub-goals of the page. The presentation section contains speech
prompts, grammars and GUI objects. The script section manages dialog flow and also
manipulates the presentation section with various procedures. The modular aspects of a
SALT page allow the developer to approach multimodal solutions in much the same way
traditional GUI-design is approached. The design goal is achieved using a page-based
approach where goal sub-tasks are addressed on a page by page basis. (9) The modular
structure of a SALT page supports the Multimedia Interaction Framework described in
section 2.3 above.

14
3. PROBLEM / SOLUTION
3.1. Problem
The process by which a patient gets resolution of a complaint via common clinical
methods consumes significant physician resources and is therefore beyond the
reach of many people who do not have access to a doctor or who can not afford
the services of a doctor. Furthermore, there are significant inconsistencies with
respect to the scope and depth of diagnostic methods employed by one physician
when they are compared to other physicians facing similar scenarios. Finally,
since often only the outcome of the diagnostic process is documented, the
diagnosis and recommended treatment; those methods or thought processes
pertaining to the diagnosis are rarely communicated or shared with other
physicians. Only the attending physician gains experience from a given
diagnostic effort.
This project proposes an approach that attempts to address these issues,
specifically: 1) Limited access to physicians: Reduce the amount of physician
resources required to perform a patient assessment and diagnosis, 2) Inconsistent
Diagnostic process: Reduce the inconsistencies in diagnostic efficiency and
accuracy between physicians when addressing a specific complaint. 3)
Knowledge – Experience disconnect: No viable method for sharing the results of
diagnostic efforts to improve the overall level of experience of the physician
communityand diagnostic agents.

15
3.1.1. Access to Physicians
Access to physicians is expensive, it is time consuming and it may be delayed by
weeks or months depending on that physician’s schedule. In all likelihood, the
people reading this report can both afford the expense of a visit to the doctor and
have the disposable time to devote to the visit, but that is not the case for many
others. The challenge then is to extend the doctors presence by “out-sourcing”
some of what the doctor traditionally does to other resources that are less
expensive and more accessible. Family history, medical history; the forms that
we now fill out prior to seeing a physician are simple low-tech examples of a
trend in that direction. The problem with forms in general is that they require a
certain level of reading and writing skill or vocabulary that I might not possess.
Most people are more comfortable just talking to someone and answering a few
relevant questions.
3.1.2. Inconsistent Diagnostic Process
Accurate diagnosis relies on the ability of the physician to match attribute patterns
typical of a condition or disease to those attributes exhibited by the patient.
Effective and efficient diagnosis relies on the ability of the physician to focus on
the most significant attributes and not be distracted by applicable but less
significant attributes. The physician’s ability to diagnose is framed by his
experience and on-hand reference or lack thereof. Inconsistencies between
physicians in both accuracy and efficiency arise when their experience and
knowledge vary. This is especially evident when a physician is exposed to a

16
patient demographic that is different than the demographics on which his training
or previous practice was based. The challenge here is to make available a system
that employs a consistent approach to applying experience to the diagnostic
process; more specifically, provide a system uses greedy pattern matching as an
alternative to more common clinical methods (i.e. educated guessing and testing).
3.1.3. Knowledge – Experience Disconnect
Since little of the actual thought process or methods employed by a physician
during diagnostic efforts are documented, the only one who stands to benefit in
terms of knowledge or experience from the knowledge gained during diagnostic
interviews is the physician conducting the interview. Even then, the passage of
time will erode the physician’s recollection and that experience will be lost to
everyone. The challenge then is to create a framework that can provide both a
means of saving diagnostic decision processes and also provide a means of
distilling those decisions and sharing them as composite experience to physicians
and their (our) agents.
3.2. SOLUTION
Design and implement a system that will extend the physicians presence by
performing diagnostic interviews based on a consistent and logical evaluation of
causal attributes. The application will conduct the interview as a spoken question
and answer session between the patient and the machine with a medical attendant
present. Create composite experience. Data will be structured such that it can be

17
updated to reflect the experience gained from each successful diagnosis
performed by the system. This is accomplished by adding new concept – attribute
pairs or by refining concept – attribute pair connection strength as experience is
gained
3.2.1. Physician Task Outsourcing
The application will perform those tasks that deal with data collection and it will
also perform an initial diagnosis. Just as is the case during interviews conducted
by physicians; application generated questions are phrased using the layman’s
equivalent of the medical attribute being evaluated and then spoken to the patient.
Each successive question is based on the answers given to those questions asked
earlier in the interview. Both the direction of the diagnostic conversation and the
ultimate diagnosis are dependent on the patient’s answers. Consistency is
achieved at the diagnostic level not by the specific questions asked, but by the
relevance of the questions asked.
3.2.2. Composite Experience
Composite experience is the ability of the system to leverage case history and
diagnostic methods across physician and practice divides. Each time a diagnosis
is made the system’s data is structured so that it can be updated to reflect the
impact on the system’s experience. The strength of the connection between the
diagnosis and the complaint can be adjusted based on either positive or negative
feedback. Attributes that were used to make the diagnosis can also be updated

18
with respect to the strength of their connection to the diagnosis. Thus experience
becomes equivalent to the collective levels of connectivity between a condition or
disease and those attributes that define it. The experience of the system should
grow in proportion to its level of use just as our experience grows with our
involvement in a process or endeavor. To better reflect the impact of time on
experience, it is proposed that the connection strength of certain attributes should
decay over time such that changes in disease patterns, especially with respect to
demographics, can be detected more readily. This is discussed under future work
in more detail.

19
4. DEVELOPMENT
4.1. Requirements
The high level requirements for the system are as follows:
• Keep it simple
• Implement the W3C multimodal framework to the extent possible
• Provide a generic solution, keep it knowledge domain independent.
• Provide a simple and portable data representation of the concept-attribute
relationship that includes connection strength and attribute value domain.
• Provide a simple and portable data representation of standard attribute
properties with respect to both graphic and speech output generation.
• Provide a method for examining concept attributes in a logical context aware
sequence using a minimum number of questions but yielding a high
confidence concept match.
• Provide a dynamic multimodal interface for gathering user input via both
conversation and pointing device.
• Provide a graphic representation ofthe progress of the input gathering effort
with respect to pattern matching. Distinction between positive and negative
evaluations must be intuitive.
• Provide a tool to view dialog manager status.
4.2. Functional Overview
The application is structured as three distinct layers; client, business and data. The
client layer implements the input aspects of the multimodal framework and also hosts
the dialog manager and agent functions. At the client layer, tasks are divided between

20
the parent page and the multimodal page. The parent page hosts the dialog manager
and agent functions, the multimodal page implements the multimodal framework
input functions.
The business layer implements the output aspects of the multimodal framework,
specifically page generation, styling and rendering. Data connectivity is also
supported at this layer.
The data layer is comprised of two data files. One defines the knowledge network
and the other defines attribute properties that are used by the page generator for ad-
hoc multimodal page creation. Figure 6, below, illustrates the application component
relationships.

21
$
WWULEXWH
.
H
Figure 6
Component Diagram
4.2.1. Dynamic Page Generation
The multimodal page generator constructs HTML+SALT pages based on
directives from the DDM. The basic tasks performed by the page generator are
inline grammar generation, HTML / SALT generation and input validation.
For the purposes of this discussion page generator tasks are viewed as belonging
to one of two areas; output component generation or input component generation.

22
Output components include grammar and HTML / SALT generation while input
addresses input validation.
The generated page supports two distinct types of media; what the user can say
and what the user can see, grammar and HTML respectively. Grammar generation
and HTML generation with embedded SALT is a nuclear operation. The dialog
manager passes an attribute key to the page generator. The page generator
retrieves an attribute query set corresponding to the attribute key from the server.
The query set contains the attribute key, a visual queue, an audio queue, an input
type and valid input values. Using the query set as the argument, a Web page is
generated containing embedded SALT, an HTML form and an inline grammar.
Standard input handling functions and slot handling are also inserted into the
page.
4.2.1.1. Embedded SALT
The Multimodal page generator embeds SALT tags into the generated page
based on the propertycontext of the focus attribute. The <listen>,<prompt>,
<grammar> and <bind> tags are embedded with the appropriate arguments.
4.2.1.2. HTML Form
The multimodal page generator creates an ad-hoc HTML page containing a
<form> tag and <input> tags. The <input> tag type, label and value is
dictated by the property context of the focus attribute. Invisible <input> tags
are created to serve as session control flags.

23
4.2.1.3. Dynamic Grammar Generation
An inline grammar is generated using arguments retrieved from attribute
properties data set relative to the attribute being evaluated. The data set is
used to define the domain of acceptable responses within the context of the
prompt. Due to the specificity of the questions asked by this application, the
data domain is primarily one that evaluates a positive or negative response.
4.2.1.4. Page Control
JavaScript functions are embedded by the page generator to provide page
control. Page results are passed back to the dialog manager along with a
process request using the Document Object Model.
4.2.2. Diagnostic Dialog Manager
The diagnostic dialog manager organizes and evaluates concepts and attributes
related to some area domain or in this case a provisional list of potential problems
/ diagnostic possibilities. Acting on a key word or words in the user’s response,
the DDM initializes the diagnostic interview by importing all relevant concepts
and their related attributes for evaluation. The diagnostic process generally used
by the application is based on selecting the next most relevant question within the
context of the conversation’s progress up to that moment.

24
Figure 7
Sequence Diagram
Dialog Process Flow
1. System retrieves the area of interest domain
2. System constructs a Web page with embedded
SALT and an inline grammar based on the area
domain key words.
3. The user identifies the area of interest by

25
uttering a phrase that contains the area of
interest keyword(s) or by using the mouse to
pick an entry from the list of values presented
by the HTML form.
4. The diagnostic dialog manager (DDM)
retrieves the associated domain of concept –
attribute pairs and organizes them based on
concept – attribute pair connector strength.
5. The DDM selects the most relevant concept –
attribute pair and calls the dynamic page
generator with a reference to the selected
attribute.
6. The dynamic page generator retrieves attribute
arguments from the data source and constructs
a multimodal page representing the first
question of the diagnostic conversation. The
question pertains to the attribute with the
highest attribute strength related to the class
with the highest attribute strength.
4.2.3. Diagnostic Decision Process
Entry into the decision process occurs at the point when the user answers the first
question posed by the dialog manager.

26
Figure 8
Decision Process
Decision Process
7. The user utters a response to the question or selects
their response from the Web page list of values.

27
8. The user’s answer to the question is passed back to
the DDM. The DDM updates the value of all class-
attribute pairs with the same attribute based on that
response.
9. The DDM performs a reconciliation pass through
the attribute domain to produce a pattern matching
score grouped and aggregated by concept based on
updated attribute values.
10. Depending on the conversation base, a set number
of questions must be answered prior to
transitioning the dialog manager from a lockstep
progression to a pattern matching progression. In
either case, the DDM selects the next concept –
attribute pair and hands it off to the page generator.
11. Steps 5 through 8 are repeated until either a pattern
matching score equals or exceeds the set
confidence threshold or all questions are asked.
As discussed earlier, relevancy in a pattern matching or diagnostic process is based on
the magnitude of the concept – attribute pair connector. When several concepts are
being evaluated as a potential match for an initial condition, relevancy must be
established at two or more levels. In other words, the most relevant question would be

28
one that tests the most relevant attribute of the most relevant diagnosis for the
expressed condition. Each time an answer is provided to a question, diagnosis
relevancy may change depending on the context of that answer. If another diagnosis
becomes more relevant, the system immediately shifts its line of questioning to
address attributes that pertain to the most relevant diagnosis. The process of
questioning and evaluation continues until either the minimum threshold for percent
confidence is exceeded or until all attributes for all potential diagnosis have been
characterized.
4.2.4. Data Structure
In order to eliminate the need for a database server and preserve application
simplicity, data is stored as tab delimited text files.
The diagnostic data structure supports both the “experience” aspect of the diagnostic
process, the visual aspects of the speech interface and the linguistic and semantic
aspects of an inline grammar. The Concept – Attribute data structure is comprised of
tuples with five properties; Type, Concept Description, Attribute Description,
Connector Weight and Positive Response. The Concept – Attribute data structure
represents a simple Naïve Bayes Network.

29
Component Description Examples
Type Entry type Concept OR
Attribute
c a
Concept
Description
If type=”c” Area Description; if
type=”a” Concept Description
elbow medial_
epicondylitis
Attribute
Description
If type=”c” Concept
Description, if type =”a”
Attribute Description
medial_
epicondylitis
flexation_pain
Connector Weight Connection magnitude;
relevancy
456 200
Positive Response The response value that
satisfies a true state for the
attribute. (not used for
type=”c”
Yes
Table 1
Concept – Attribute Pairs
The Attribute Argument data structure defines the verbal and visual queues needed to
communicate a question about a specific attribute. Information in the attribute argument
data structure is retrieved by the page generator. Each tuple in the data structure contains
five properties; attribute, visual queue, speech queue, input type and input domain. The
attribute property relates the two tables.

30
Component Description Examples
Attribute Attribute Key – sets
relationship to concept-
attribute pairs.
flexation_pain
Visual Queue FORM label Is the pain worse with resisted
flexation?
Speech Queue SALT prompt Is the pain worse when you pull on
something?
Input Type Input Type YN
Input Domain Allowed Input Value(s) Yes,No,Yep,Nop
Table 2
Attribute Arguments

31
5. IMPLEMENTATION
The application is implemented as a parent Web page with an inline frame supporting a
dynamic multimodal page. Figure 9 represents a high level view of the application. As
described in previous chapters, the parent Web page hosts the dialog manager and the
diagnostic agent. The parent page yields control to the multimodal page each time a
multimodal prompt is generated. After the user responds, functions embedded in the
multimodal page generate a semantic interpretation of the users spoken or pointer
generated response. Page control functions in the multimodal page use the DOM to
transfer the semantic interpretation to the parent Web page. The multimodal page then
issues a reset command to the parent page. The reset method prompts the dialog manager
to pass the semantic interpretation to the diagnostic agent for evaluation and network
updates. At the conclusion of the updates, the diagnostic agent passes the next attribute
to be evaluated to the dialog manager, which, in-turn, updates the source of the inline
frame with a call to the dynamic page generator using the attribute as the argument.
A few scenarios are described below to provide a practical and more detailed
understanding of the system in operation.

32
Figure 9
Basic Application Diagram
Figure 10 below shows the interface after having selected the area “elbow”. Note that all
potential diagnosis are listed in the page header and illustrated in the progress table on the
left hand side of the page. The progress table is organized by Concept and Attribute in
descending connector strength. The initial visual prompt is “Is the pain focal”, and the
corresponding layman’s equivalent spoken prompt is “Is your pain only in a specific
area?” At this point the system is waiting for either a speech or a pointer generated reply
from the user. Control of the application has been momentarily transferred to the
multimodal page. When the user provides a response the system will generate a semantic
interpretation of the response. The interpreted response and application control will be
returned to the parent page dialog manager. Figure 11 below illustrates the condition of
the progress panel after several questions have been asked. Green entries represent
attributes that have evaluated as a positive matchto a potential diagnosis. Red entries are
attributes that have evaluated as a negative match to a potential diagnosis and the yellow
entry is the attribute that the system is currently requesting user input on. The

33
illustration adjacent to the progress panel is a pop-up window that shows internal status
of the dialog manager. In this case you can see the percentage positive weight is highest
for lateral epicondylitis (tennis elbow); this indicates that the system will continue
evaluating attributes for that diagnosis until the required minimum confidence threshold
has been achieved. The pop-up is used primarily during application tuning and is not
intended to be part of the user’s toolset.
Figure 10
Example 1 – Tennis elbow
The next example below illustrates the results of a diagnostic interview where the system
has transitioned from the most common diagnosis for elbow conditions, lateral
epicondylitis, to the least common diagnosis.

34
Figure 11
Example 2: Diagnosis Transition
The transition occurred when the patient indicated that there was local swelling. This
caused the percent positive weight to shift to a diagnosis of arthritis. The system now
pursues confirmation of this diagnosis by confirming additional attributes through
questioning. Note the next question pertains to both bursitis and arthritis; since both
share the same attribute and value, the diagnostic focus is unlikely to shift based on this
question.
Figure 13 illustrates the results status of the system following evaluation of the next two
questions. At this point, the system has recognized that the minimum confidence
threshold has been reached and is proposing a diagnosis. Note that a diagnosis of
bursitis was possible but based on the relative weighting of the attributes a diagnosis of
arthritis is more likely.

35
Figure 12
Example 3: Diagnosis
A snapshot of the diagnostic session is saved for future reference and system updates.
The intention is to document the diagnostic decision process in a format that can be used
by a feedback mechanism. (Ref. Future Work).
The dialog manager calls the dynamic page generator with the diagnosis and the
associated confidence factor. The page generator creates a multimodal page with a
diagnosis prompt and asks the user if there are other areas or conditions that need to be
evaluated. If the user response is “Yes”, the system generates a call to update the
window.location object for the parent object with the referrer to the inline frame, which is
in effect recalling the parent Web page.

36
IF (ANS=="YES")
{
WINDOW.LOCATION=DOCUMENT.REFERRER
} ELSE {
WINDOW.LOCATION="HTTP://WWW.AUBURN.EDU/~BAKERBT";
}
Table 3
DOM Window.location object

37
6. PRACTICAL APPLICATIONS
The concepts used in the medical diagnostic application described above are applicable to
several other areas where there is a need to extend the presence of humans who possess
knowledge and experience in focused areas.
6.1. Clinical trainer
A large part of a physicians training is hands-on clinical involvement. This is where the
physician learns to apply knowledge gained in medical school to real world scenarios.
The effectiveness of traditional clinical training is challenged by local demographics, the
number of patients that can realistically be seen and the limited exposure the physician
has to a large spectrum of conditions or illnesses. Using the diagnostic application as a
guide, the physician can augment experience in an area by performing drills with the
application for a given condition and demographic segment.
6.2. Autonomous planning agent
Heavy industry installations spend millions of dollars each year on asset maintenance and
the work controls process. A medium size plant often has over a dozen technical
workers dedicated to planning work. Their primary job is to provide repair plans so
equipment can be fixed and returned to service with a minimum impact on production.
An important, and often difficult, aspect of developing a repair plan is to diagnose the
cause of the reported problem. The challenge in many cases is that the problem is not
documented very well and the planner rarely has occasion to talk to the person reporting
the problem. A solution to this would be to allow users to report the problem to a
planning agent. The planning agent could ask relevant questions at the time the problem

38
is reported; this approach would both provide a more detailed problem description and
have the potential to establish the cause of the problem. Once the cause has been
identified the system could automatically generate a work plan.

39
7. FUTURE WORK
This application is a work in progress or proof of concept and as such is not intended to
be implemented as a complete solution. Figure 14 illustrates the mature architecture with
the current implementation in light blue. There are three general areas of additional
functionality needed to move this project to a production status. The first is a method for
documenting patient history, including demographic attributes, so that the diagnostic
agent can “prune” potential diagnosis based on that data. Although not essential to
obtaining a diagnosis, the ability to consider patient history and demographics can reduce
the number of attributes that need to be evaluated to arrive at a high confidence diagnosis.
The system would use demographic membership and patient histories to, in effect, prune
the knowledge network.
The second is a method for feeding back the results of diagnostic interviews into the
system. This would occur after a follow-up visit with the patient confirms that the
diagnosis was correct. The set of attributes evaluated to produce the diagnosis would be
fed back into the knowledge network by incrementing the connection strength for those
attributes. This practice allows the system to rapidly gain experience. The addition of an
“aging” rate on certain demographic attributes will make the system more responsive to
changes in social and cultural shifts.
The third enhancement is the creation of a multimodal learning interface where
humans can communicate via speech their experience to the machine in terms of
concepts, attributes and attribute values. This will be my next research area.

40
' QDPLF' LDJQRVWLF0 XOWLP RGDO,QWHUIDFH
. QRZOHGJH%DVH
8VHU
' HPRJUDSKLF
3HUVSHFWLYH
3DWLHQW
+LVWRU
Z
' HPRJUDSKLF
$WWULEXWHV
' LDJQRVWLF
6HVVLRQ
' LDJQRVWLF$WWULEXWHV
' LDJQRVLV
) ROORZ 8 S
GLDJQRVLV
FRQILUPDWLRQ
8VHU
0 XOWLPRGDOWHDFKLQJ
VHVVLRQ
Figure 13
Future work

41
8. REFERERNCES
1. Larson, James A. “How to Converse with a Virtual Agent by Speaking
and Listening Using Standard W3C Languages”. Retrieved May 21, 2005
from http://www.larson-tech.com/Writings/VR.pdf
2. Larson, James A. “Standard Languages for Developing Multimodal
Applications”. Retrieved May 21, 2005 from http://www.larson-
tech.com/Writings/multimodal.pdf
3. W3C Multimodal Interaction Framework. Retrieved May 21, 2005 from
http://www.w3.org/TR/mmi-framework/
4. W3C Multimodal Architecture and Interfaces. Retrieved May 21, 2005
from http://www.w3.org/TR/mmi-arch/
5. Introduction to Cognitive Science Website:
http://www.unc.edu/depts/cogsci/123/connectionist3.htm
6. Puncher, Michael., Kepesi, Marian. “Multimodal Mobile Robot Control
using Speech ApplicationLanguage Tags”. Retrieved May 21, 2005 from
http://userver.ftw.at/~pucher/papers/mmrobot1.pdf
7. Salces, Fausto J., Llewellyn-Jones, David., Merabti, Madjid. “Multimodal
Interfaces in a Ubiquitous Computing Environment”. Retrieved May 21,
2005 from http://www.bath.ac.uk/comp-sci/hci/UK-
Ubinet%20Files/Llewellyn-Jones/FSainz-3rdUbinet.pdf
8. Villasenor-Pineda, L., Montes-y-Gomez, M., Caelen, J.. “A Modal Logic
Framework for Human-Computer Spoken Interaction”. Retrieved May 21,
2005 from
http://ccc.inaoep.mx/~mmontesg/publicaciones/2004/LogicFramework-
CicLing04.pdf
9. Wang, Kuansa. “SALT: An XML Application for Web-based Multimodal
Dialog Management”. , 2nd
Workshop on NLP and XML(NLPXML-2002)
Taipei, September 1, 2002 (The 19th
International Conference on
Computational Linguistics). Retrieved May 21, 2005 from
http://acl.ldc.upenn.edu/W/W02/W02-1715.pdf

42
10. Keogh, Eamonn J., Pazzani, Micheal J.. “Learning Augmented Bayesian
Classifiers: A comparison of Distribution-based and Classification-based
Approaches”. Retrieved May 21, 2005 from
http://www.ics.uci.edu/~pazzani/Publications/EamonnAIStats.pdf
11. Reitter, David., Panttaja, Erin Marie., Cummins, Fred. “UI on the Fly:
Generating a Multimodal User Interface”. Retrieved May 21, 2005 from
http://www.medialabeurope.org/research/library/reitter-
etal_uifly_2004.pdf
12. Panttaja, Erin Marie., Reitter, David., Cummins, Fred. “The Evaluation of
Adaptable Multimodal System Outputs”. Retrieved May 21, 2005 from
http://www.reitter-it-media.de/compling/papers/panttaja-
etal_evaluation_2004.pdf
13. Tomassi, Paul. “Logic and Diagnostic”. Retrieved May 21, 2005 from
http://www.ul.ie/~philos/vol3/gnostic.html
14. Fraser, Robin C. “Clinical Method: General Practice Approach”.
Butterworth and Co. London. 1987
15. Aydede, Murat, “The Language of Thought Hypothesis”, The Stanford
Encyclopedia of Philoshophy (Fall 2004 Edition), Edward N. Zalta (ed.),
Retrieved from http://plato.stanford.edu/archives/fall2004/entries/logic-ai/
16. Dietterich, Thomas G. “Machine-Learning Research: Four Current
Directions” The American Association for Artificial Intelligence.
Retrieved on May 21, 2005 from
http://www.aaai.org/Library/Magazine/Vol18/18-04/Papers/AIMag18-04-
010.pdf
17. Ragan, Brian., Zhu, Weimo., Kang, Minsoo., Flegel, Melinida.
“Construction of an Ankle Injury Diagnostic Decision Tree”. Retrieved
May 21, 2005 from
http://www.kines.uiuc.edu/labWebpages/Kinesmetrics/Presentations/Data
%20mining_02/Web-pdf/DMfinal_3.pdf
18. Moret, Bernard M. E. “Decision Trees and Diagrams” ACM Comput.
Surv. Vol 14-4, ACM Press, New York, 1982

43
19. Eardley, David D.., Aronsky, Dominik., Chapman, Wendy W., Haug,
Peter J.. “Using Decision Tree Classifiers to Confirm Pneumonia
Diagnosis” . Retrieved on May 21, 2005 from
http://www.amia.org/pubs/symposia/D200520.PDF
20. Kahn, Charles E. Jr. M.D., Haddawy, Peter, Ph.D., “Optimizing
Diagnositc and Therapeutic Strategies using Decision-Theoretic Planning:
Principles and Applications”. Retrieved May 21, 2005 from
http://www.mcw.edu/midas/papers/Medinfo-1995.pdf
21. Druzdzel Marek J., Diez, Francisso J. “Combining Knowledge from
Different Sources in Causal Probalistic Models”. Journal of Machine
Learning Research 4(2003) 295-316, July 2003

DMDI

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (16)

Similar to DMDI

Similar to DMDI (19)

DMDI