SlideShare a Scribd company logo
1 of 51
Download to read offline
i
Dynamic Multimodal Diagnostic Interface
A Report Submitted to the Graduate Faculty of
Auburn University
In Partial Fulfillment of the Requirements for the Degree of
Master of Science
In
Software Engineering
Auburn, Alabama
July 13, 2005
ii
Dynamic Multimodal Diagnostic Interface
Billy Thomas Baker, Jr.
Certificate of Approval
Juan E. Gilbert, Chair
Assistant Professor
Computer Science and Software
Engineering
Dean Hendrix
Associate Professor
Computer Science and Software
Engineering
Cheryl Seals
Assistant Professor
Computer Science and Software
Engineering
iii
Dynamic Multimodal Diagnostic Interface
Billy Thomas Baker, Jr.
Abstract
Multimodal interfaces are becoming increasingly important as a means of replacing or extending
the human presence. This project demonstrates a system for performing diagnostic interviews
using a Web based multimodal interface. Specifically, this report outlines the design and
implementation of a system that dynamically generates multimodal Web pages in conjunction
with a diagnostic dialog manager. This system enables a layperson or physician to participate in
a diagnostic conversation with the application where the goal is to arrive at a diagnosis or
decision. From the developer’s standpoint; this project suggests a simple and effective approach
for executing ad-hoc context dependent conversations in a multimodal interface. Design
considerations, implementation details, a practical example and future work are presented.
iv
Acknowledgement
I’ll begin by thanking Dr. Juan Gilbert. He has been a constant source of inspiration and his
ability to offer encouragement at the right time has, in large part, made this project possible. I
especially thank Dr. Gilbert for introducing me to the fie ld of multimodal interfaces. Thanks go
to Dr. Dean Hendrix and Dr. Cheryl Seals for supporting me as members of my committee.
Thank you to my fellow students in the HCCL lab for taking the time to read and critique this
report. The managers and executives at Southern Nuclear must be mentioned, their support over
the last few years has been invaluable and their acceptance of my eccentricities is appreciated.
Thanks to my son Dannon, he continues to vigorously engage me in insightful discussions on
human cognition and machine learning. I thank my brother, Dr. Jim, who provided much of the
rheumatologic diagnostic data and explained many of the alien terms to this layman. My wife
Ann has been an eternal source of encouragement, support, and tolerance. Without the support of
these good people I would not have gotten this far.
v
Contents
1. INTRODUCTION………………………………………………………………… 1
1.1. Approach………………………………………………..……………….. 2
2. BACKGROUND………………………………………………………………….. 3
2.1. Learning and Conversation………...…………………………………… 3
2.2. Diagnostic Process………………………………...……………………. 4
2.2.1. Medical Assessments………………………………………….. 6
2.2.2. Demographic Considerations……………….………………… 7
2.3. Multimodal Interaction Framework…………………………………….. 7
2.3.1 Input……………………………………………………………. 8
2.3.2 Output………………………………………………………….. 9
2.3.3 Interaction Management…………………………….…………. 10
2.3.4 Agent Functions……………………………………….………. 11
2.3.5 Session Component…………………………………….………. 11
2.3.6 System & Environment………………………………….……… 11
2.4 SALT……………………………………………………………………… 11
3. PROBLEM / SOLUTION………………………………………………………… 14
3.1 Problem…………………………………………………………………... 14
3.1.1 Access to Physicians…………………………………………… 15
3.1.2 Inconsistent Diagnostic Process……………………………….. 15
vi
3.1.3 Knowledge – Experience Disconnect……………….………….. 16
3.2 SOLUTION………………………………………………………………. 16
3.2.1 Physician Task Outsourcing…………………………………. 17
3.2.2 Composite Experience…………………………………………. 17
4. DEVELOPMENT………………………………………………………………… 19
4.1 Requirements…………………………….……………………………….. 19
4.2 Functional Overview………………….………………………………….. 19
4.2.1 Dynamic Page Generation…….……………………………….. 21
4.2.1.1 Embedded SALT……………………………………… 22
4.2.1.2 HTML Form………………………………………….. 22
4.2.1.3 Dynamic Grammar Generation……………………… 23
4.2.1.4 Page Control…………………………………………. 23
4.2.2 Diagnostic Dialog Manager…………………………………… 23
4.2.3 Diagnostic Decision Process…………………………………... 25
4.2.4 Data Structure………………………………………………….. 28
5. IMPLEMENTATION……………………………………………………………. 31
6. PRACTICAL APPLICATIONS………………………………………………….. 37
6.1 Clinical trainer…………………………………………………………… 37
6.2 Autonomous planning agent……………………………………………… 37
7. FUTURE WORK…………………………………………………………………. 39
8. REFERERNCES…………………………………………………………………. 41
vii
List of figures
1. Figure 1 - Connectionist Concept-Attribute Model Naïve Bayes Network........... 3
2. Figure 2 - Diagnostic Process……………………………………...…………… 6
3. Figure 3 - Multimedia Interaction Framework Overview…………………...….. 8
4. Figure 4 - Multimedia Interaction Framework - Input…………………………. 9
5. Figure 5 - Multimedia Interaction Framework - Output…………………........... 10
6. Figure 6 - Component Diagram……………………………………...…………. 21
7. Figure 7- Sequence Diagram………………………………...…….…………… 24
8. Figure 8 - Decision Process…………………………………………..………… 26
9. Figure 9 - Basic Application Diagram……………………………..…………… 32
10. Figure 10 - Example1: Tennis Elbow…………………………………………… 33
11. Figure 11 - Example2: Diagnosis Transition…………...………………………. 34
12 Figure 12 - Example3: Diagnosis………………………………………………. 35
13 Figure 13 - Future Work………………………………………………………... 40
viii
List of Tables
1. Table 1- Concept – Attribute Pairs……………………………………………. 29
2. Table 2 - Attribute Arguments…………………………………………………. 30
3. Table 3 - DOM Window.location object………………………………………. 36
1
1. INTRODUCTION
Humans learn (acquire knowledge) by experiencing their environment in terms of
primitive attributes related to their senses. Since most humans share the same array
of senses, they share a common set of primitive attributes that can be conveyed as
concepts to other humans via conversation or other means of communication. Speech
is the primary means humans have used in the past to communicate their experience
to others. The advent of writing enhanced communication with an “almost” time
independent media. Written concepts could be clearly communicated to people
hundreds of years in the future or thousands of miles away. The printing press
extended the influence of writing by making the knowledge and the experience
captured by writing available to all who could read and obtain the media. A few
hundred years later, computers and the internet now are positioned to make the
printing press irrelevant and have made communication effectively independent of
time. Concepts and knowledge can be shared with people all over the planet within
seconds of having been created or preserved for use decades later. Today we are on
the threshold of another step change in communication; machines are beginning to act
as our agents. They can collect information specific to our domain of interest and
within the context of our life style and needs, may use that information to enhance our
knowledge and awareness of our environment. People will in effect “outsource”
much of the tedious and labor intensive aspects of data mining to agents who in-turn
will help people answer questions, make decisions, recognize patterns and in some
cases take action for us.
2
1.1. APPROACH
The goal of this project was to demonstrate the potential of combining aspects of
artificial intelligence with a multimodal interface to deliver a human proxy for
conducting diagnostic interviews: a diagnostic agent. Since interviews are not
scripted but are context dependent, the application has to be robust enough to select
and ask questions by considering the information gained earlier in the conversation.
Given the need to ask questions in an “ad-hoc” environment, the application also
must be able to manage the dialog progress by generating multimodal interfaces on-
the-fly. (11,12) The World Wide Web Consortium (W3C) has proposed a framework
for multimodal interface implementation and to the extent possible; the application
has been designed to conform to that framework.(3) Significant effort was made to
keep the system as simple as possible; dialog management is implemented from the
client versus the server. Page control and data flow between the interface and the
dialog manager is accomplished on the client via the DOM (Document Object
Model). The application also provides feedback to the user or co-user by presenting a
graphic representation of the dialog scope and progress. In order to further simplify
deployment, system reference data is deployed using simple static text files; no
relational database management system is required. The application makes use of
PHP (PHP: Hypertext Preprocessor) for server side processing and data retrieval.
The Web GUI is DHTML with embedded SALT (Speech Application Language
Tags).
3
2. BACKGROUND
2.1. LEARNING and CONVERSATION
This report will consider a concept as a collection of primitive attributes and other
supporting concepts. The number of supporting concepts and primitive attributes that
define a concept can be equated to a level of knowledge about that concept. (15,16)
The strength of the associated interconnections between a concept, its attributes and
supporting concepts, can be directly related to ones ability to recognize that concept;
interconnection strength is, in effect, your level of familiarity and your experience
with that concept.
Figure 1
Connectionist Concept-Attribute Model
Naïve Bayes Network
Machine learning can be defined as the process by which a machine acquires
knowledge or experience. In the connectionist model it is postulated that networks
4
learn by changing the strengths of their interconnections and/or establishing new
interconnections in response to feedback (experience). Figure 1 illustrates the
concept – attribute relationship for a Naïve Bayes Network. (10) A tuple (a collection
of all the facts related to one entity, often a row in a table) representing a network
relationship would include the concept, the attribute, the connection strength and
attribute value at a minimum.
2.2. DIAGNOSTIC PROCESS
Traditionally, theoretic diagnostic methods have been categorized as one or a
combination of three primary reasoning techniques; probabilistic, deterministic and
causal.(20) Probabilistic or statistical reasoning makes conclusions based on the
statistical correlation between observed and reference attributes.(21) This technique
lends itself to mathematical definition (Bayes Theorem ref.) where the diagnosis is
promptly computed as soon as the pertinent attributes are assessed. Deterministic
reasoning makes conclusions based on the outcome of a series of binary rules
organized into logical progressions called decision trees.(18) The order of the rules in
the tree is optimized to minimize the number of rules needed to reach an outcome.
(19) Causal reasoning makes conclusions based on a comparison between actual
conditions and a “causal model” representing normality. Potential cause mechanisms
are either validated or excluded after comparison against the model.
In actual practice, diagnosis is the process of identifying the cause of a problem or
situation by identifying the distinguishing attributes of the problem or situation and
5
then relating those attributes to the distinguishing attributes of potential causes of that
problem or situation. When most effective, diagnosis is a combination of the three
reasoning techniques described above, but it is accomplished automatically without
any formal alignment with, or consideration of, the previously mentioned techniques.
(13) Robin C Fraser (1987) in his Clinical Method: A General Practice Approach (14)
states: “In actual clinical practice, however, such an approach to clinical problem-
solving is rarely used by general practitioners and infrequently used by hospital
doctors because it lacks discrimination and has a poor yield in terms of the time and
effort expended…..In reality, most clinicians reach diagnosis by a process of
hypothetico-deductive reasoning, i.e. by educated guessing and testing”. Not
surprisingly, a closer look reveals that diagnosis resembles the process humans
unconsciously use to recognize all aspects of their environment; namely, a holistic
(pattern) matching process of concepts and attributes. Pattern matching is not about
absolute matches but more about establishing the best match for the smallest attribute
set. This is the crux of the experience – knowledge relationship.
Diagnosis can be relegated to four distinct phases: problem recognition, problem or
cause attribute correlation, attribute assessment and feedback. Given a problem or
situation, correlation to potential causes results in one or more cause attributes being
identified that can be assessed with respect to the related problem attribute in order to
prove or disprove the potential cause. On a higher level, diagnosis is the process of
evaluating the degree of attribute correlation between a problem and possible causes.
The correlation process normally involves several attributes that may vary from
deterministic attributes (e.g. always applicable for a condition) to probabilistic
6
attributes (e.g. sometimes applicable to a condition). (17) The level of objectivity or
subjectivity inherent in the attribute assessment phase further complicates diagnosis.
Figure 2
Diagnosis: A Model. From Clinical Method, Robin C Fraser 1987.
2.2.1. MEDICAL ASSESSMENTS
The medical assessment, sometimes referred to as an impression, is the process by
which a physician evaluates patient medical history, family history, social
environment-demographics and, if applicable, observes current symptoms. The
assessment is normally initiated as a result of the patient communicating a complaint.
The ultimate goal of the assessment is to reach a diagnosis of the complaint and if
warranted propose a course of treatment. The validity of the diagnosis is dependent
on both the completeness and accuracy of the patient’s medical and family histories
7
as well as the thoroughness of the physician’s examination and dialogue with the
patient. The degree of thoroughness exercised in an examination can be correlated to
the degree of relevant experience the physician has with the condition being
evaluated. As mentioned earlier, that experience is effectively the doctors’
knowledge of the potential causes of the condition being evaluated and the observable
attributes of those causes. Results of the assessment are typically documented on
paper or transcribed from voice recordings and later reviewed by the physician for
accuracy and completeness.
2.2.2. DEMOGRAPHIC CONSIDERATIONS
The impact demographic attributes have on the accuracy of a diagnosis can be
significant but recognition of the influence of demographic specific attributes can be
difficult for a physician without exposure to large amounts of diagnostic data where
the full spectrum of demographic variations are included. Access to data, other than
what is gained via personal experience, is normally limited to that offered in medical
journals or lectures.
2.3. MULTIMODAL INTERACTION FRAMEWORK
The World Wide Web Consortium (W3C) is proposing a framework for multimodal
interaction. Simplistically, the multimodal interaction framework is comprised of an
Interaction Manager that accepts input from the user via one or multiple modes of input, such
as speaking, typing, mouse or gestures. The Interaction Manager acts as liaison between the
user and agent functions, session component and system / environment. Output from the
8
agent functions is presented to the user via one or more modalities; most commonly speech
and graphics. (1, 2)
The approach used to implement the interaction manager varies with the application but by
far the most common Web application is the speech enabled HTML form.(4)
Figure 3
Multimedia Interaction Framework – Overview
2.3.1. INPUT
The input component can be broken down into three sub-components; recognition,
interpretation and integration. The recognition component captures and translates user input
into a form that is useful to the interpretation component. Speech is converted into text using
language and acoustic models along with a speech recognition grammar. Mouse movement
and clicks are converted to x-y positions and key presses are converted into text based
characters. Other modes of input such as handwriting, DTMF, biometrics and vision would
be translated in this component. The interpretation component further processes input from
the recognition component primarily in cases where more than one recognition component
input value have the same meaning or semantic intention. The integration component
integrates the output of interpretation components to yield a synchronized and composite
9
output that is routed to the interaction manager.(6) An example of integration would be
synchronizing mouse input and speech input to yield a single user intention.
Figure 4
Multimedia Interaction Framework – Input
2.3.2. OUTPUT
The output component can also be broken down into three subcomponents; the generation
component, the styling component and the rendering component. The generation component
uses output from the interaction manger to determine the modality of information presented
to the user. In the case of a multimodalWeb page, the generation component would provide
both the graphics and speech outputs. The styling component inserts layout information. In
the case of speech, the “layout” information might be voice timbre, inflection and volume; in
the case of graphics, layout is the familiar position, size, color, etc. The rendering
component processes the information provided by the styling components into formats that
10
the user can understand. Speech output is converted into a voice; graphics output is
converted into text, controls and other graphic representations.
Figure 5
Multimedia Interaction Framework – Output
2.3.3. INTERACTION MANAGEMENT
The interaction management component coordinates the flow of interaction and execution
between the input and output components. On receipt of input information from the input
components, the interaction management component updates application context and
information. The updated context and information is then routed to the output components.
Several tools may be used to implement the interaction manager. Those tools include HTML,
XHTML, Speech Application Language Tags (SALT), C, C++ and X+V (XHTML plus
Voice).
11
2.3.4. AGENT FUNCTIONS
Agent functions evaluate the interaction state provided by the interaction manager and
respond with program flow directives. Business and process logic are from agent functions to
the user by way of the interaction manager.
2.3.5. SESSION COMPONENT
The session component provides an interface for requesting and releasing session resources
for distributed applications where one or more device or user is involved. The session
component is also instrumental in managing applications that require persistent and in
managing resources in distributed environments.
2.3.6. SYSTEM & ENVIRONMENT
The system and environment component will facilitate dynamic adaptation to changes in
device capabilities, environmental conditions and user preferences. This component will
modify the actions of the interaction manager as the number of devices and user’s changes;
both distributed and stand alone implementations must be supported.
2.4. SALT
Speech Application Language Tags (SALT) is an XML specification for elements that can be
embedded into an application to provide input/output control of speech recognition and
speech synthesis. SALT was contributed to the W3C in 2002 by the SALT Forum; an
industry group supported by Microsoft, Intel, Cisco, Comverse, Philips and ScanSoft
(originally SpeechWorks). Unlike VXML, SALT contains no support structures, interaction
flow must be provided by the host language. Elements that enable user speech input are
12
called prompts; elements that provide speech output are called responses. A brief overview
of the four top level SALT tags follows:
• <listen> for speech input; a speech input object is instantiated in the XML
document when this tag encountered. The listen element also contains grammar,
binding and recording controls:
o <grammar> specifies or references the domain of words and phrases that
the system will recognize. The actual grammar can be implemented as
either an integral part of the page or it can be contained in a separate file and
referenced via a universal resource indicator.
o <bind> integrates speech with host application logic by binding spoken
input value into the page.
o <record> records sounds, speech, etc.
• <prompt> for speech output: a speech output object instantiated in the XML
document when this tag encountered. The listen element also contains the grammar,
binding controls described above.
• <dtmf> for touch-tone input
• <smex> for platform messaging to enable platform call-control and telephony
features. This element also contains the binding control to bind process messages.
All four top level elements contain the <param> element. This element is used to extend
SALT elements with new functions. SALT as a whole can be extended with new
functionality using XML.
SALT pages can be viewed as being composed of three primary sections, data,
presentation and script. The data section defines the information the user will provide to the
13
application in order to meet sub-goals of the page. The presentation section contains speech
prompts, grammars and GUI objects. The script section manages dialog flow and also
manipulates the presentation section with various procedures. The modular aspects of a
SALT page allow the developer to approach multimodal solutions in much the same way
traditional GUI-design is approached. The design goal is achieved using a page-based
approach where goal sub-tasks are addressed on a page by page basis. (9) The modular
structure of a SALT page supports the Multimedia Interaction Framework described in
section 2.3 above.
14
3. PROBLEM / SOLUTION
3.1. Problem
The process by which a patient gets resolution of a complaint via common clinical
methods consumes significant physician resources and is therefore beyond the
reach of many people who do not have access to a doctor or who can not afford
the services of a doctor. Furthermore, there are significant inconsistencies with
respect to the scope and depth of diagnostic methods employed by one physician
when they are compared to other physicians facing similar scenarios. Finally,
since often only the outcome of the diagnostic process is documented, the
diagnosis and recommended treatment; those methods or thought processes
pertaining to the diagnosis are rarely communicated or shared with other
physicians. Only the attending physician gains experience from a given
diagnostic effort.
This project proposes an approach that attempts to address these issues,
specifically: 1) Limited access to physicians: Reduce the amount of physician
resources required to perform a patient assessment and diagnosis, 2) Inconsistent
Diagnostic process: Reduce the inconsistencies in diagnostic efficiency and
accuracy between physicians when addressing a specific complaint. 3)
Knowledge – Experience disconnect: No viable method for sharing the results of
diagnostic efforts to improve the overall level of experience of the physician
communityand diagnostic agents.
15
3.1.1. Access to Physicians
Access to physicians is expensive, it is time consuming and it may be delayed by
weeks or months depending on that physician’s schedule. In all likelihood, the
people reading this report can both afford the expense of a visit to the doctor and
have the disposable time to devote to the visit, but that is not the case for many
others. The challenge then is to extend the doctors presence by “out-sourcing”
some of what the doctor traditionally does to other resources that are less
expensive and more accessible. Family history, medical history; the forms that
we now fill out prior to seeing a physician are simple low-tech examples of a
trend in that direction. The problem with forms in general is that they require a
certain level of reading and writing skill or vocabulary that I might not possess.
Most people are more comfortable just talking to someone and answering a few
relevant questions.
3.1.2. Inconsistent Diagnostic Process
Accurate diagnosis relies on the ability of the physician to match attribute patterns
typical of a condition or disease to those attributes exhibited by the patient.
Effective and efficient diagnosis relies on the ability of the physician to focus on
the most significant attributes and not be distracted by applicable but less
significant attributes. The physician’s ability to diagnose is framed by his
experience and on-hand reference or lack thereof. Inconsistencies between
physicians in both accuracy and efficiency arise when their experience and
knowledge vary. This is especially evident when a physician is exposed to a
16
patient demographic that is different than the demographics on which his training
or previous practice was based. The challenge here is to make available a system
that employs a consistent approach to applying experience to the diagnostic
process; more specifically, provide a system uses greedy pattern matching as an
alternative to more common clinical methods (i.e. educated guessing and testing).
3.1.3. Knowledge – Experience Disconnect
Since little of the actual thought process or methods employed by a physician
during diagnostic efforts are documented, the only one who stands to benefit in
terms of knowledge or experience from the knowledge gained during diagnostic
interviews is the physician conducting the interview. Even then, the passage of
time will erode the physician’s recollection and that experience will be lost to
everyone. The challenge then is to create a framework that can provide both a
means of saving diagnostic decision processes and also provide a means of
distilling those decisions and sharing them as composite experience to physicians
and their (our) agents.
3.2. SOLUTION
Design and implement a system that will extend the physicians presence by
performing diagnostic interviews based on a consistent and logical evaluation of
causal attributes. The application will conduct the interview as a spoken question
and answer session between the patient and the machine with a medical attendant
present. Create composite experience. Data will be structured such that it can be
17
updated to reflect the experience gained from each successful diagnosis
performed by the system. This is accomplished by adding new concept – attribute
pairs or by refining concept – attribute pair connection strength as experience is
gained
3.2.1. Physician Task Outsourcing
The application will perform those tasks that deal with data collection and it will
also perform an initial diagnosis. Just as is the case during interviews conducted
by physicians; application generated questions are phrased using the layman’s
equivalent of the medical attribute being evaluated and then spoken to the patient.
Each successive question is based on the answers given to those questions asked
earlier in the interview. Both the direction of the diagnostic conversation and the
ultimate diagnosis are dependent on the patient’s answers. Consistency is
achieved at the diagnostic level not by the specific questions asked, but by the
relevance of the questions asked.
3.2.2. Composite Experience
Composite experience is the ability of the system to leverage case history and
diagnostic methods across physician and practice divides. Each time a diagnosis
is made the system’s data is structured so that it can be updated to reflect the
impact on the system’s experience. The strength of the connection between the
diagnosis and the complaint can be adjusted based on either positive or negative
feedback. Attributes that were used to make the diagnosis can also be updated
18
with respect to the strength of their connection to the diagnosis. Thus experience
becomes equivalent to the collective levels of connectivity between a condition or
disease and those attributes that define it. The experience of the system should
grow in proportion to its level of use just as our experience grows with our
involvement in a process or endeavor. To better reflect the impact of time on
experience, it is proposed that the connection strength of certain attributes should
decay over time such that changes in disease patterns, especially with respect to
demographics, can be detected more readily. This is discussed under future work
in more detail.
19
4. DEVELOPMENT
4.1. Requirements
The high level requirements for the system are as follows:
• Keep it simple
• Implement the W3C multimodal framework to the extent possible
• Provide a generic solution, keep it knowledge domain independent.
• Provide a simple and portable data representation of the concept-attribute
relationship that includes connection strength and attribute value domain.
• Provide a simple and portable data representation of standard attribute
properties with respect to both graphic and speech output generation.
• Provide a method for examining concept attributes in a logical context aware
sequence using a minimum number of questions but yielding a high
confidence concept match.
• Provide a dynamic multimodal interface for gathering user input via both
conversation and pointing device.
• Provide a graphic representation ofthe progress of the input gathering effort
with respect to pattern matching. Distinction between positive and negative
evaluations must be intuitive.
• Provide a tool to view dialog manager status.
4.2. Functional Overview
The application is structured as three distinct layers; client, business and data. The
client layer implements the input aspects of the multimodal framework and also hosts
the dialog manager and agent functions. At the client layer, tasks are divided between
20
the parent page and the multimodal page. The parent page hosts the dialog manager
and agent functions, the multimodal page implements the multimodal framework
input functions.
The business layer implements the output aspects of the multimodal framework,
specifically page generation, styling and rendering. Data connectivity is also
supported at this layer.
The data layer is comprised of two data files. One defines the knowledge network
and the other defines attribute properties that are used by the page generator for ad-
hoc multimodal page creation. Figure 6, below, illustrates the application component
relationships.
21
$
WWULEXWH
.
H
Figure 6
Component Diagram
4.2.1. Dynamic Page Generation
The multimodal page generator constructs HTML+SALT pages based on
directives from the DDM. The basic tasks performed by the page generator are
inline grammar generation, HTML / SALT generation and input validation.
For the purposes of this discussion page generator tasks are viewed as belonging
to one of two areas; output component generation or input component generation.
22
Output components include grammar and HTML / SALT generation while input
addresses input validation.
The generated page supports two distinct types of media; what the user can say
and what the user can see, grammar and HTML respectively. Grammar generation
and HTML generation with embedded SALT is a nuclear operation. The dialog
manager passes an attribute key to the page generator. The page generator
retrieves an attribute query set corresponding to the attribute key from the server.
The query set contains the attribute key, a visual queue, an audio queue, an input
type and valid input values. Using the query set as the argument, a Web page is
generated containing embedded SALT, an HTML form and an inline grammar.
Standard input handling functions and slot handling are also inserted into the
page.
4.2.1.1. Embedded SALT
The Multimodal page generator embeds SALT tags into the generated page
based on the propertycontext of the focus attribute. The <listen>,<prompt>,
<grammar> and <bind> tags are embedded with the appropriate arguments.
4.2.1.2. HTML Form
The multimodal page generator creates an ad-hoc HTML page containing a
<form> tag and <input> tags. The <input> tag type, label and value is
dictated by the property context of the focus attribute. Invisible <input> tags
are created to serve as session control flags.
23
4.2.1.3. Dynamic Grammar Generation
An inline grammar is generated using arguments retrieved from attribute
properties data set relative to the attribute being evaluated. The data set is
used to define the domain of acceptable responses within the context of the
prompt. Due to the specificity of the questions asked by this application, the
data domain is primarily one that evaluates a positive or negative response.
4.2.1.4. Page Control
JavaScript functions are embedded by the page generator to provide page
control. Page results are passed back to the dialog manager along with a
process request using the Document Object Model.
4.2.2. Diagnostic Dialog Manager
The diagnostic dialog manager organizes and evaluates concepts and attributes
related to some area domain or in this case a provisional list of potential problems
/ diagnostic possibilities. Acting on a key word or words in the user’s response,
the DDM initializes the diagnostic interview by importing all relevant concepts
and their related attributes for evaluation. The diagnostic process generally used
by the application is based on selecting the next most relevant question within the
context of the conversation’s progress up to that moment.
24
Figure 7
Sequence Diagram
Dialog Process Flow
1. System retrieves the area of interest domain
2. System constructs a Web page with embedded
SALT and an inline grammar based on the area
domain key words.
3. The user identifies the area of interest by
25
uttering a phrase that contains the area of
interest keyword(s) or by using the mouse to
pick an entry from the list of values presented
by the HTML form.
4. The diagnostic dialog manager (DDM)
retrieves the associated domain of concept –
attribute pairs and organizes them based on
concept – attribute pair connector strength.
5. The DDM selects the most relevant concept –
attribute pair and calls the dynamic page
generator with a reference to the selected
attribute.
6. The dynamic page generator retrieves attribute
arguments from the data source and constructs
a multimodal page representing the first
question of the diagnostic conversation. The
question pertains to the attribute with the
highest attribute strength related to the class
with the highest attribute strength.
4.2.3. Diagnostic Decision Process
Entry into the decision process occurs at the point when the user answers the first
question posed by the dialog manager.
26
Figure 8
Decision Process
Decision Process
7. The user utters a response to the question or selects
their response from the Web page list of values.
27
8. The user’s answer to the question is passed back to
the DDM. The DDM updates the value of all class-
attribute pairs with the same attribute based on that
response.
9. The DDM performs a reconciliation pass through
the attribute domain to produce a pattern matching
score grouped and aggregated by concept based on
updated attribute values.
10. Depending on the conversation base, a set number
of questions must be answered prior to
transitioning the dialog manager from a lockstep
progression to a pattern matching progression. In
either case, the DDM selects the next concept –
attribute pair and hands it off to the page generator.
11. Steps 5 through 8 are repeated until either a pattern
matching score equals or exceeds the set
confidence threshold or all questions are asked.
As discussed earlier, relevancy in a pattern matching or diagnostic process is based on
the magnitude of the concept – attribute pair connector. When several concepts are
being evaluated as a potential match for an initial condition, relevancy must be
established at two or more levels. In other words, the most relevant question would be
28
one that tests the most relevant attribute of the most relevant diagnosis for the
expressed condition. Each time an answer is provided to a question, diagnosis
relevancy may change depending on the context of that answer. If another diagnosis
becomes more relevant, the system immediately shifts its line of questioning to
address attributes that pertain to the most relevant diagnosis. The process of
questioning and evaluation continues until either the minimum threshold for percent
confidence is exceeded or until all attributes for all potential diagnosis have been
characterized.
4.2.4. Data Structure
In order to eliminate the need for a database server and preserve application
simplicity, data is stored as tab delimited text files.
The diagnostic data structure supports both the “experience” aspect of the diagnostic
process, the visual aspects of the speech interface and the linguistic and semantic
aspects of an inline grammar. The Concept – Attribute data structure is comprised of
tuples with five properties; Type, Concept Description, Attribute Description,
Connector Weight and Positive Response. The Concept – Attribute data structure
represents a simple Naïve Bayes Network.
29
Component Description Examples
Type Entry type Concept OR
Attribute
c a
Concept
Description
If type=”c” Area Description; if
type=”a” Concept Description
elbow medial_
epicondylitis
Attribute
Description
If type=”c” Concept
Description, if type =”a”
Attribute Description
medial_
epicondylitis
flexation_pain
Connector Weight Connection magnitude;
relevancy
456 200
Positive Response The response value that
satisfies a true state for the
attribute. (not used for
type=”c”
Yes
Table 1
Concept – Attribute Pairs
The Attribute Argument data structure defines the verbal and visual queues needed to
communicate a question about a specific attribute. Information in the attribute argument
data structure is retrieved by the page generator. Each tuple in the data structure contains
five properties; attribute, visual queue, speech queue, input type and input domain. The
attribute property relates the two tables.
30
Component Description Examples
Attribute Attribute Key – sets
relationship to concept-
attribute pairs.
flexation_pain
Visual Queue FORM label Is the pain worse with resisted
flexation?
Speech Queue SALT prompt Is the pain worse when you pull on
something?
Input Type Input Type YN
Input Domain Allowed Input Value(s) Yes,No,Yep,Nop
Table 2
Attribute Arguments
31
5. IMPLEMENTATION
The application is implemented as a parent Web page with an inline frame supporting a
dynamic multimodal page. Figure 9 represents a high level view of the application. As
described in previous chapters, the parent Web page hosts the dialog manager and the
diagnostic agent. The parent page yields control to the multimodal page each time a
multimodal prompt is generated. After the user responds, functions embedded in the
multimodal page generate a semantic interpretation of the users spoken or pointer
generated response. Page control functions in the multimodal page use the DOM to
transfer the semantic interpretation to the parent Web page. The multimodal page then
issues a reset command to the parent page. The reset method prompts the dialog manager
to pass the semantic interpretation to the diagnostic agent for evaluation and network
updates. At the conclusion of the updates, the diagnostic agent passes the next attribute
to be evaluated to the dialog manager, which, in-turn, updates the source of the inline
frame with a call to the dynamic page generator using the attribute as the argument.
A few scenarios are described below to provide a practical and more detailed
understanding of the system in operation.
32
Figure 9
Basic Application Diagram
Figure 10 below shows the interface after having selected the area “elbow”. Note that all
potential diagnosis are listed in the page header and illustrated in the progress table on the
left hand side of the page. The progress table is organized by Concept and Attribute in
descending connector strength. The initial visual prompt is “Is the pain focal”, and the
corresponding layman’s equivalent spoken prompt is “Is your pain only in a specific
area?” At this point the system is waiting for either a speech or a pointer generated reply
from the user. Control of the application has been momentarily transferred to the
multimodal page. When the user provides a response the system will generate a semantic
interpretation of the response. The interpreted response and application control will be
returned to the parent page dialog manager. Figure 11 below illustrates the condition of
the progress panel after several questions have been asked. Green entries represent
attributes that have evaluated as a positive matchto a potential diagnosis. Red entries are
attributes that have evaluated as a negative match to a potential diagnosis and the yellow
entry is the attribute that the system is currently requesting user input on. The
33
illustration adjacent to the progress panel is a pop-up window that shows internal status
of the dialog manager. In this case you can see the percentage positive weight is highest
for lateral epicondylitis (tennis elbow); this indicates that the system will continue
evaluating attributes for that diagnosis until the required minimum confidence threshold
has been achieved. The pop-up is used primarily during application tuning and is not
intended to be part of the user’s toolset.
Figure 10
Example 1 – Tennis elbow
The next example below illustrates the results of a diagnostic interview where the system
has transitioned from the most common diagnosis for elbow conditions, lateral
epicondylitis, to the least common diagnosis.
34
Figure 11
Example 2: Diagnosis Transition
The transition occurred when the patient indicated that there was local swelling. This
caused the percent positive weight to shift to a diagnosis of arthritis. The system now
pursues confirmation of this diagnosis by confirming additional attributes through
questioning. Note the next question pertains to both bursitis and arthritis; since both
share the same attribute and value, the diagnostic focus is unlikely to shift based on this
question.
Figure 13 illustrates the results status of the system following evaluation of the next two
questions. At this point, the system has recognized that the minimum confidence
threshold has been reached and is proposing a diagnosis. Note that a diagnosis of
bursitis was possible but based on the relative weighting of the attributes a diagnosis of
arthritis is more likely.
35
Figure 12
Example 3: Diagnosis
A snapshot of the diagnostic session is saved for future reference and system updates.
The intention is to document the diagnostic decision process in a format that can be used
by a feedback mechanism. (Ref. Future Work).
The dialog manager calls the dynamic page generator with the diagnosis and the
associated confidence factor. The page generator creates a multimodal page with a
diagnosis prompt and asks the user if there are other areas or conditions that need to be
evaluated. If the user response is “Yes”, the system generates a call to update the
window.location object for the parent object with the referrer to the inline frame, which is
in effect recalling the parent Web page.
36
IF (ANS=="YES")
{
WINDOW.LOCATION=DOCUMENT.REFERRER
} ELSE {
WINDOW.LOCATION="HTTP://WWW.AUBURN.EDU/~BAKERBT";
}
Table 3
DOM Window.location object
37
6. PRACTICAL APPLICATIONS
The concepts used in the medical diagnostic application described above are applicable to
several other areas where there is a need to extend the presence of humans who possess
knowledge and experience in focused areas.
6.1. Clinical trainer
A large part of a physicians training is hands-on clinical involvement. This is where the
physician learns to apply knowledge gained in medical school to real world scenarios.
The effectiveness of traditional clinical training is challenged by local demographics, the
number of patients that can realistically be seen and the limited exposure the physician
has to a large spectrum of conditions or illnesses. Using the diagnostic application as a
guide, the physician can augment experience in an area by performing drills with the
application for a given condition and demographic segment.
6.2. Autonomous planning agent
Heavy industry installations spend millions of dollars each year on asset maintenance and
the work controls process. A medium size plant often has over a dozen technical
workers dedicated to planning work. Their primary job is to provide repair plans so
equipment can be fixed and returned to service with a minimum impact on production.
An important, and often difficult, aspect of developing a repair plan is to diagnose the
cause of the reported problem. The challenge in many cases is that the problem is not
documented very well and the planner rarely has occasion to talk to the person reporting
the problem. A solution to this would be to allow users to report the problem to a
planning agent. The planning agent could ask relevant questions at the time the problem
38
is reported; this approach would both provide a more detailed problem description and
have the potential to establish the cause of the problem. Once the cause has been
identified the system could automatically generate a work plan.
39
7. FUTURE WORK
This application is a work in progress or proof of concept and as such is not intended to
be implemented as a complete solution. Figure 14 illustrates the mature architecture with
the current implementation in light blue. There are three general areas of additional
functionality needed to move this project to a production status. The first is a method for
documenting patient history, including demographic attributes, so that the diagnostic
agent can “prune” potential diagnosis based on that data. Although not essential to
obtaining a diagnosis, the ability to consider patient history and demographics can reduce
the number of attributes that need to be evaluated to arrive at a high confidence diagnosis.
The system would use demographic membership and patient histories to, in effect, prune
the knowledge network.
The second is a method for feeding back the results of diagnostic interviews into the
system. This would occur after a follow-up visit with the patient confirms that the
diagnosis was correct. The set of attributes evaluated to produce the diagnosis would be
fed back into the knowledge network by incrementing the connection strength for those
attributes. This practice allows the system to rapidly gain experience. The addition of an
“aging” rate on certain demographic attributes will make the system more responsive to
changes in social and cultural shifts.
The third enhancement is the creation of a multimodal learning interface where
humans can communicate via speech their experience to the machine in terms of
concepts, attributes and attribute values. This will be my next research area.
40
'  QDPLF' LDJQRVWLF0 XOWLP RGDO,QWHUIDFH
. QRZOHGJH%DVH
8VHU
' HPRJUDSKLF
3HUVSHFWLYH
3DWLHQW
+LVWRU
Z
' HPRJUDSKLF
$WWULEXWHV
' LDJQRVWLF
6HVVLRQ
' LDJQRVWLF$WWULEXWHV
' LDJQRVLV
) ROORZ 8 S
GLDJQRVLV
FRQILUPDWLRQ
8VHU
0 XOWLPRGDOWHDFKLQJ
VHVVLRQ
Figure 13
Future work
41
8. REFERERNCES
1. Larson, James A. “How to Converse with a Virtual Agent by Speaking
and Listening Using Standard W3C Languages”. Retrieved May 21, 2005
from http://www.larson-tech.com/Writings/VR.pdf
2. Larson, James A. “Standard Languages for Developing Multimodal
Applications”. Retrieved May 21, 2005 from http://www.larson-
tech.com/Writings/multimodal.pdf
3. W3C Multimodal Interaction Framework. Retrieved May 21, 2005 from
http://www.w3.org/TR/mmi-framework/
4. W3C Multimodal Architecture and Interfaces. Retrieved May 21, 2005
from http://www.w3.org/TR/mmi-arch/
5. Introduction to Cognitive Science Website:
http://www.unc.edu/depts/cogsci/123/connectionist3.htm
6. Puncher, Michael., Kepesi, Marian. “Multimodal Mobile Robot Control
using Speech ApplicationLanguage Tags”. Retrieved May 21, 2005 from
http://userver.ftw.at/~pucher/papers/mmrobot1.pdf
7. Salces, Fausto J., Llewellyn-Jones, David., Merabti, Madjid. “Multimodal
Interfaces in a Ubiquitous Computing Environment”. Retrieved May 21,
2005 from http://www.bath.ac.uk/comp-sci/hci/UK-
Ubinet%20Files/Llewellyn-Jones/FSainz-3rdUbinet.pdf
8. Villasenor-Pineda, L., Montes-y-Gomez, M., Caelen, J.. “A Modal Logic
Framework for Human-Computer Spoken Interaction”. Retrieved May 21,
2005 from
http://ccc.inaoep.mx/~mmontesg/publicaciones/2004/LogicFramework-
CicLing04.pdf
9. Wang, Kuansa. “SALT: An XML Application for Web-based Multimodal
Dialog Management”. , 2nd
Workshop on NLP and XML(NLPXML-2002)
Taipei, September 1, 2002 (The 19th
International Conference on
Computational Linguistics). Retrieved May 21, 2005 from
http://acl.ldc.upenn.edu/W/W02/W02-1715.pdf
42
10. Keogh, Eamonn J., Pazzani, Micheal J.. “Learning Augmented Bayesian
Classifiers: A comparison of Distribution-based and Classification-based
Approaches”. Retrieved May 21, 2005 from
http://www.ics.uci.edu/~pazzani/Publications/EamonnAIStats.pdf
11. Reitter, David., Panttaja, Erin Marie., Cummins, Fred. “UI on the Fly:
Generating a Multimodal User Interface”. Retrieved May 21, 2005 from
http://www.medialabeurope.org/research/library/reitter-
etal_uifly_2004.pdf
12. Panttaja, Erin Marie., Reitter, David., Cummins, Fred. “The Evaluation of
Adaptable Multimodal System Outputs”. Retrieved May 21, 2005 from
http://www.reitter-it-media.de/compling/papers/panttaja-
etal_evaluation_2004.pdf
13. Tomassi, Paul. “Logic and Diagnostic”. Retrieved May 21, 2005 from
http://www.ul.ie/~philos/vol3/gnostic.html
14. Fraser, Robin C. “Clinical Method: General Practice Approach”.
Butterworth and Co. London. 1987
15. Aydede, Murat, “The Language of Thought Hypothesis”, The Stanford
Encyclopedia of Philoshophy (Fall 2004 Edition), Edward N. Zalta (ed.),
Retrieved from http://plato.stanford.edu/archives/fall2004/entries/logic-ai/
16. Dietterich, Thomas G. “Machine-Learning Research: Four Current
Directions” The American Association for Artificial Intelligence.
Retrieved on May 21, 2005 from
http://www.aaai.org/Library/Magazine/Vol18/18-04/Papers/AIMag18-04-
010.pdf
17. Ragan, Brian., Zhu, Weimo., Kang, Minsoo., Flegel, Melinida.
“Construction of an Ankle Injury Diagnostic Decision Tree”. Retrieved
May 21, 2005 from
http://www.kines.uiuc.edu/labWebpages/Kinesmetrics/Presentations/Data
%20mining_02/Web-pdf/DMfinal_3.pdf
18. Moret, Bernard M. E. “Decision Trees and Diagrams” ACM Comput.
Surv. Vol 14-4, ACM Press, New York, 1982
43
19. Eardley, David D.., Aronsky, Dominik., Chapman, Wendy W., Haug,
Peter J.. “Using Decision Tree Classifiers to Confirm Pneumonia
Diagnosis” . Retrieved on May 21, 2005 from
http://www.amia.org/pubs/symposia/D200520.PDF
20. Kahn, Charles E. Jr. M.D., Haddawy, Peter, Ph.D., “Optimizing
Diagnositc and Therapeutic Strategies using Decision-Theoretic Planning:
Principles and Applications”. Retrieved May 21, 2005 from
http://www.mcw.edu/midas/papers/Medinfo-1995.pdf
21. Druzdzel Marek J., Diez, Francisso J. “Combining Knowledge from
Different Sources in Causal Probalistic Models”. Journal of Machine
Learning Research 4(2003) 295-316, July 2003

More Related Content

What's hot

IRJET- Deep Learning Methods for Selecting Appropriate Cosmetic Products ...
IRJET-  	  Deep Learning Methods for Selecting Appropriate Cosmetic Products ...IRJET-  	  Deep Learning Methods for Selecting Appropriate Cosmetic Products ...
IRJET- Deep Learning Methods for Selecting Appropriate Cosmetic Products ...IRJET Journal
 
The Turing Test for Telepresence
The Turing Test for TelepresenceThe Turing Test for Telepresence
The Turing Test for Telepresenceijma
 
Password Authentication Framework Based on Encrypted Negative Password
Password Authentication Framework Based on Encrypted Negative PasswordPassword Authentication Framework Based on Encrypted Negative Password
Password Authentication Framework Based on Encrypted Negative PasswordIJSRED
 
BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...
BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...
BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...IJASCSE
 
IRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live ImageIRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live ImageIRJET Journal
 
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...Dejan Ilic
 
On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...ijdpsjournal
 
Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning
Desensitized RDCA Subspaces for Compressive Privacy in Machine LearningDesensitized RDCA Subspaces for Compressive Privacy in Machine Learning
Desensitized RDCA Subspaces for Compressive Privacy in Machine LearningArtur Filipowicz
 
Agency in Human-Smart Device Relationships: An Exploratory Study
Agency in Human-Smart Device Relationships: An Exploratory StudyAgency in Human-Smart Device Relationships: An Exploratory Study
Agency in Human-Smart Device Relationships: An Exploratory StudyFrancesco Lelli
 
IRJET - Gender Recognition from Facial Images
IRJET - Gender Recognition from Facial ImagesIRJET - Gender Recognition from Facial Images
IRJET - Gender Recognition from Facial ImagesIRJET Journal
 
icmi2015_ChaZhang
icmi2015_ChaZhangicmi2015_ChaZhang
icmi2015_ChaZhangZhiding Yu
 
Dragos_Papava_dissertation
Dragos_Papava_dissertationDragos_Papava_dissertation
Dragos_Papava_dissertationDragoș Papavă
 
SoSoCo project for elderly
SoSoCo project for elderlySoSoCo project for elderly
SoSoCo project for elderlyJackson Choi
 
A case study analysis on digital convergent design: Skynet Platform
A case study analysis on digital convergent design: Skynet PlatformA case study analysis on digital convergent design: Skynet Platform
A case study analysis on digital convergent design: Skynet Platformdi8it
 

What's hot (17)

Hy2514281431
Hy2514281431Hy2514281431
Hy2514281431
 
IRJET- Deep Learning Methods for Selecting Appropriate Cosmetic Products ...
IRJET-  	  Deep Learning Methods for Selecting Appropriate Cosmetic Products ...IRJET-  	  Deep Learning Methods for Selecting Appropriate Cosmetic Products ...
IRJET- Deep Learning Methods for Selecting Appropriate Cosmetic Products ...
 
The Turing Test for Telepresence
The Turing Test for TelepresenceThe Turing Test for Telepresence
The Turing Test for Telepresence
 
Password Authentication Framework Based on Encrypted Negative Password
Password Authentication Framework Based on Encrypted Negative PasswordPassword Authentication Framework Based on Encrypted Negative Password
Password Authentication Framework Based on Encrypted Negative Password
 
BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...
BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...
BARRIERS SURROUNDING KNOWLEDGE TRANSFER IN NON-COLLOCATED SOFTWARE ARCHITECTU...
 
IRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live ImageIRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live Image
 
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...
ILIC Dejan - MSc: Secure Business Computation by using Garbled Circuits in a ...
 
On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...
 
Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning
Desensitized RDCA Subspaces for Compressive Privacy in Machine LearningDesensitized RDCA Subspaces for Compressive Privacy in Machine Learning
Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning
 
Agency in Human-Smart Device Relationships: An Exploratory Study
Agency in Human-Smart Device Relationships: An Exploratory StudyAgency in Human-Smart Device Relationships: An Exploratory Study
Agency in Human-Smart Device Relationships: An Exploratory Study
 
The age of GANs
The age of GANsThe age of GANs
The age of GANs
 
IRJET - Gender Recognition from Facial Images
IRJET - Gender Recognition from Facial ImagesIRJET - Gender Recognition from Facial Images
IRJET - Gender Recognition from Facial Images
 
Fulltext02
Fulltext02Fulltext02
Fulltext02
 
icmi2015_ChaZhang
icmi2015_ChaZhangicmi2015_ChaZhang
icmi2015_ChaZhang
 
Dragos_Papava_dissertation
Dragos_Papava_dissertationDragos_Papava_dissertation
Dragos_Papava_dissertation
 
SoSoCo project for elderly
SoSoCo project for elderlySoSoCo project for elderly
SoSoCo project for elderly
 
A case study analysis on digital convergent design: Skynet Platform
A case study analysis on digital convergent design: Skynet PlatformA case study analysis on digital convergent design: Skynet Platform
A case study analysis on digital convergent design: Skynet Platform
 

Viewers also liked

Project Experience while working at Intertech
Project Experience while working at IntertechProject Experience while working at Intertech
Project Experience while working at IntertechJim Wink
 
การดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัล
การดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัลการดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัล
การดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัลSittisak Rungcharoensuksri
 
Proyecto informatica-grupo-3
Proyecto informatica-grupo-3Proyecto informatica-grupo-3
Proyecto informatica-grupo-3Eripam26
 
NYU hacknight, april 6, 2016
NYU hacknight, april 6, 2016NYU hacknight, april 6, 2016
NYU hacknight, april 6, 2016Mikhail Sosonkin
 
Blue Hacathon- Φ. Ρούτσης
Blue Hacathon- Φ. ΡούτσηςBlue Hacathon- Φ. Ρούτσης
Blue Hacathon- Φ. ΡούτσηςConnected Islands
 
Tarea 5 pubmed
Tarea 5   pubmedTarea 5   pubmed
Tarea 5 pubmedBea Panini
 
Credit rating services
Credit rating servicesCredit rating services
Credit rating servicesBhumika Garg
 
Regional Identity Doc Martin
Regional Identity Doc MartinRegional Identity Doc Martin
Regional Identity Doc MartinDanielle
 
Media institutions powerpoint 1
Media institutions powerpoint 1Media institutions powerpoint 1
Media institutions powerpoint 1Denton Snowden
 
STEM_a_tool_for_socioeconomic_transformation_of_a_nation
STEM_a_tool_for_socioeconomic_transformation_of_a_nationSTEM_a_tool_for_socioeconomic_transformation_of_a_nation
STEM_a_tool_for_socioeconomic_transformation_of_a_nationMoses Olayemi
 
Case study: Design Space Exploration for Electronic Cooling simulation
Case study: Design Space Exploration for Electronic Cooling simulationCase study: Design Space Exploration for Electronic Cooling simulation
Case study: Design Space Exploration for Electronic Cooling simulationSuzana Djurcilov
 
Carolyn Fisher_SWHC Final Presentation
Carolyn Fisher_SWHC Final PresentationCarolyn Fisher_SWHC Final Presentation
Carolyn Fisher_SWHC Final PresentationCarrie Fisher
 
Hangley Aronchick Alumni Reunion 2016
Hangley Aronchick Alumni Reunion 2016Hangley Aronchick Alumni Reunion 2016
Hangley Aronchick Alumni Reunion 2016Anne Lotz Voorhees
 

Viewers also liked (16)

Project Experience while working at Intertech
Project Experience while working at IntertechProject Experience while working at Intertech
Project Experience while working at Intertech
 
การดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัล
การดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัลการดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัล
การดำเนินงานและความท้าทายของงานจดหมายเหตุไทยในยุคดิจิทัล
 
Azul Linhas Aéreas
Azul Linhas AéreasAzul Linhas Aéreas
Azul Linhas Aéreas
 
Proyecto informatica-grupo-3
Proyecto informatica-grupo-3Proyecto informatica-grupo-3
Proyecto informatica-grupo-3
 
To my dad
To my dadTo my dad
To my dad
 
Diagnóstico de aula práctica III
Diagnóstico de aula práctica IIIDiagnóstico de aula práctica III
Diagnóstico de aula práctica III
 
NYU hacknight, april 6, 2016
NYU hacknight, april 6, 2016NYU hacknight, april 6, 2016
NYU hacknight, april 6, 2016
 
Blue Hacathon- Φ. Ρούτσης
Blue Hacathon- Φ. ΡούτσηςBlue Hacathon- Φ. Ρούτσης
Blue Hacathon- Φ. Ρούτσης
 
Tarea 5 pubmed
Tarea 5   pubmedTarea 5   pubmed
Tarea 5 pubmed
 
Credit rating services
Credit rating servicesCredit rating services
Credit rating services
 
Regional Identity Doc Martin
Regional Identity Doc MartinRegional Identity Doc Martin
Regional Identity Doc Martin
 
Media institutions powerpoint 1
Media institutions powerpoint 1Media institutions powerpoint 1
Media institutions powerpoint 1
 
STEM_a_tool_for_socioeconomic_transformation_of_a_nation
STEM_a_tool_for_socioeconomic_transformation_of_a_nationSTEM_a_tool_for_socioeconomic_transformation_of_a_nation
STEM_a_tool_for_socioeconomic_transformation_of_a_nation
 
Case study: Design Space Exploration for Electronic Cooling simulation
Case study: Design Space Exploration for Electronic Cooling simulationCase study: Design Space Exploration for Electronic Cooling simulation
Case study: Design Space Exploration for Electronic Cooling simulation
 
Carolyn Fisher_SWHC Final Presentation
Carolyn Fisher_SWHC Final PresentationCarolyn Fisher_SWHC Final Presentation
Carolyn Fisher_SWHC Final Presentation
 
Hangley Aronchick Alumni Reunion 2016
Hangley Aronchick Alumni Reunion 2016Hangley Aronchick Alumni Reunion 2016
Hangley Aronchick Alumni Reunion 2016
 

Similar to DMDI

An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...Jessica Navarro
 
Final Year Project-Gesture Based Interaction and Image Processing
Final Year Project-Gesture Based Interaction and Image ProcessingFinal Year Project-Gesture Based Interaction and Image Processing
Final Year Project-Gesture Based Interaction and Image ProcessingSabnam Pandey, MBA
 
Blackbox security white paper april 27, 2012
Blackbox security white paper april 27, 2012Blackbox security white paper april 27, 2012
Blackbox security white paper april 27, 2012Grapeshot
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Roman Atachiants
 
Computing Science Dissertation
Computing Science DissertationComputing Science Dissertation
Computing Science Dissertationrmc1987
 
A.R.C. Usability Evaluation
A.R.C. Usability EvaluationA.R.C. Usability Evaluation
A.R.C. Usability EvaluationJPC Hanson
 
Design_Thinking_CA1_N00147768
Design_Thinking_CA1_N00147768Design_Thinking_CA1_N00147768
Design_Thinking_CA1_N00147768Stephen Norman
 
01 dissertation_Restaurant e-menu on iPad
01 dissertation_Restaurant e-menu on iPad01 dissertation_Restaurant e-menu on iPad
01 dissertation_Restaurant e-menu on iPadTraitet Thepbandansuk
 
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFTCS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFTJosephat Julius
 
10.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf65
10.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf6510.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf65
10.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf65Med labbi
 
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Roman Atachiants
 
Image recognition
Image recognitionImage recognition
Image recognitionJoel Jose
 
Bachelor's Thesis Sander Ginn
Bachelor's Thesis Sander GinnBachelor's Thesis Sander Ginn
Bachelor's Thesis Sander GinnSander Ginn
 
A Philosophical Essay On Probabilities
A Philosophical Essay On ProbabilitiesA Philosophical Essay On Probabilities
A Philosophical Essay On ProbabilitiesRebecca Harris
 
Information modelling (Stefan Berner): Extract
Information modelling (Stefan Berner): ExtractInformation modelling (Stefan Berner): Extract
Information modelling (Stefan Berner): Extractvdf Hochschulverlag AG
 

Similar to DMDI (19)

An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...
 
MoneySafe-FinalReport
MoneySafe-FinalReportMoneySafe-FinalReport
MoneySafe-FinalReport
 
Final Year Project-Gesture Based Interaction and Image Processing
Final Year Project-Gesture Based Interaction and Image ProcessingFinal Year Project-Gesture Based Interaction and Image Processing
Final Year Project-Gesture Based Interaction and Image Processing
 
Final_Thesis
Final_ThesisFinal_Thesis
Final_Thesis
 
Blackbox security white paper april 27, 2012
Blackbox security white paper april 27, 2012Blackbox security white paper april 27, 2012
Blackbox security white paper april 27, 2012
 
Final Project
Final ProjectFinal Project
Final Project
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
 
Computing Science Dissertation
Computing Science DissertationComputing Science Dissertation
Computing Science Dissertation
 
A.R.C. Usability Evaluation
A.R.C. Usability EvaluationA.R.C. Usability Evaluation
A.R.C. Usability Evaluation
 
Design_Thinking_CA1_N00147768
Design_Thinking_CA1_N00147768Design_Thinking_CA1_N00147768
Design_Thinking_CA1_N00147768
 
01 dissertation_Restaurant e-menu on iPad
01 dissertation_Restaurant e-menu on iPad01 dissertation_Restaurant e-menu on iPad
01 dissertation_Restaurant e-menu on iPad
 
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFTCS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
CS499_JULIUS_J_FINAL_YEAR_PROJETCT_L_DRAFT
 
10.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf65
10.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf6510.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf65
10.0000@citeseerx.ist.psu.edu@generic 8 a6c4211cf65
 
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Bachelor's Thesis Sander Ginn
Bachelor's Thesis Sander GinnBachelor's Thesis Sander Ginn
Bachelor's Thesis Sander Ginn
 
Big data
Big dataBig data
Big data
 
A Philosophical Essay On Probabilities
A Philosophical Essay On ProbabilitiesA Philosophical Essay On Probabilities
A Philosophical Essay On Probabilities
 
Information modelling (Stefan Berner): Extract
Information modelling (Stefan Berner): ExtractInformation modelling (Stefan Berner): Extract
Information modelling (Stefan Berner): Extract
 

DMDI

  • 1. i Dynamic Multimodal Diagnostic Interface A Report Submitted to the Graduate Faculty of Auburn University In Partial Fulfillment of the Requirements for the Degree of Master of Science In Software Engineering Auburn, Alabama July 13, 2005
  • 2. ii Dynamic Multimodal Diagnostic Interface Billy Thomas Baker, Jr. Certificate of Approval Juan E. Gilbert, Chair Assistant Professor Computer Science and Software Engineering Dean Hendrix Associate Professor Computer Science and Software Engineering Cheryl Seals Assistant Professor Computer Science and Software Engineering
  • 3. iii Dynamic Multimodal Diagnostic Interface Billy Thomas Baker, Jr. Abstract Multimodal interfaces are becoming increasingly important as a means of replacing or extending the human presence. This project demonstrates a system for performing diagnostic interviews using a Web based multimodal interface. Specifically, this report outlines the design and implementation of a system that dynamically generates multimodal Web pages in conjunction with a diagnostic dialog manager. This system enables a layperson or physician to participate in a diagnostic conversation with the application where the goal is to arrive at a diagnosis or decision. From the developer’s standpoint; this project suggests a simple and effective approach for executing ad-hoc context dependent conversations in a multimodal interface. Design considerations, implementation details, a practical example and future work are presented.
  • 4. iv Acknowledgement I’ll begin by thanking Dr. Juan Gilbert. He has been a constant source of inspiration and his ability to offer encouragement at the right time has, in large part, made this project possible. I especially thank Dr. Gilbert for introducing me to the fie ld of multimodal interfaces. Thanks go to Dr. Dean Hendrix and Dr. Cheryl Seals for supporting me as members of my committee. Thank you to my fellow students in the HCCL lab for taking the time to read and critique this report. The managers and executives at Southern Nuclear must be mentioned, their support over the last few years has been invaluable and their acceptance of my eccentricities is appreciated. Thanks to my son Dannon, he continues to vigorously engage me in insightful discussions on human cognition and machine learning. I thank my brother, Dr. Jim, who provided much of the rheumatologic diagnostic data and explained many of the alien terms to this layman. My wife Ann has been an eternal source of encouragement, support, and tolerance. Without the support of these good people I would not have gotten this far.
  • 5. v Contents 1. INTRODUCTION………………………………………………………………… 1 1.1. Approach………………………………………………..……………….. 2 2. BACKGROUND………………………………………………………………….. 3 2.1. Learning and Conversation………...…………………………………… 3 2.2. Diagnostic Process………………………………...……………………. 4 2.2.1. Medical Assessments………………………………………….. 6 2.2.2. Demographic Considerations……………….………………… 7 2.3. Multimodal Interaction Framework…………………………………….. 7 2.3.1 Input……………………………………………………………. 8 2.3.2 Output………………………………………………………….. 9 2.3.3 Interaction Management…………………………….…………. 10 2.3.4 Agent Functions……………………………………….………. 11 2.3.5 Session Component…………………………………….………. 11 2.3.6 System & Environment………………………………….……… 11 2.4 SALT……………………………………………………………………… 11 3. PROBLEM / SOLUTION………………………………………………………… 14 3.1 Problem…………………………………………………………………... 14 3.1.1 Access to Physicians…………………………………………… 15 3.1.2 Inconsistent Diagnostic Process……………………………….. 15
  • 6. vi 3.1.3 Knowledge – Experience Disconnect……………….………….. 16 3.2 SOLUTION………………………………………………………………. 16 3.2.1 Physician Task Outsourcing…………………………………. 17 3.2.2 Composite Experience…………………………………………. 17 4. DEVELOPMENT………………………………………………………………… 19 4.1 Requirements…………………………….……………………………….. 19 4.2 Functional Overview………………….………………………………….. 19 4.2.1 Dynamic Page Generation…….……………………………….. 21 4.2.1.1 Embedded SALT……………………………………… 22 4.2.1.2 HTML Form………………………………………….. 22 4.2.1.3 Dynamic Grammar Generation……………………… 23 4.2.1.4 Page Control…………………………………………. 23 4.2.2 Diagnostic Dialog Manager…………………………………… 23 4.2.3 Diagnostic Decision Process…………………………………... 25 4.2.4 Data Structure………………………………………………….. 28 5. IMPLEMENTATION……………………………………………………………. 31 6. PRACTICAL APPLICATIONS………………………………………………….. 37 6.1 Clinical trainer…………………………………………………………… 37 6.2 Autonomous planning agent……………………………………………… 37 7. FUTURE WORK…………………………………………………………………. 39 8. REFERERNCES…………………………………………………………………. 41
  • 7. vii List of figures 1. Figure 1 - Connectionist Concept-Attribute Model Naïve Bayes Network........... 3 2. Figure 2 - Diagnostic Process……………………………………...…………… 6 3. Figure 3 - Multimedia Interaction Framework Overview…………………...….. 8 4. Figure 4 - Multimedia Interaction Framework - Input…………………………. 9 5. Figure 5 - Multimedia Interaction Framework - Output…………………........... 10 6. Figure 6 - Component Diagram……………………………………...…………. 21 7. Figure 7- Sequence Diagram………………………………...…….…………… 24 8. Figure 8 - Decision Process…………………………………………..………… 26 9. Figure 9 - Basic Application Diagram……………………………..…………… 32 10. Figure 10 - Example1: Tennis Elbow…………………………………………… 33 11. Figure 11 - Example2: Diagnosis Transition…………...………………………. 34 12 Figure 12 - Example3: Diagnosis………………………………………………. 35 13 Figure 13 - Future Work………………………………………………………... 40
  • 8. viii List of Tables 1. Table 1- Concept – Attribute Pairs……………………………………………. 29 2. Table 2 - Attribute Arguments…………………………………………………. 30 3. Table 3 - DOM Window.location object………………………………………. 36
  • 9. 1 1. INTRODUCTION Humans learn (acquire knowledge) by experiencing their environment in terms of primitive attributes related to their senses. Since most humans share the same array of senses, they share a common set of primitive attributes that can be conveyed as concepts to other humans via conversation or other means of communication. Speech is the primary means humans have used in the past to communicate their experience to others. The advent of writing enhanced communication with an “almost” time independent media. Written concepts could be clearly communicated to people hundreds of years in the future or thousands of miles away. The printing press extended the influence of writing by making the knowledge and the experience captured by writing available to all who could read and obtain the media. A few hundred years later, computers and the internet now are positioned to make the printing press irrelevant and have made communication effectively independent of time. Concepts and knowledge can be shared with people all over the planet within seconds of having been created or preserved for use decades later. Today we are on the threshold of another step change in communication; machines are beginning to act as our agents. They can collect information specific to our domain of interest and within the context of our life style and needs, may use that information to enhance our knowledge and awareness of our environment. People will in effect “outsource” much of the tedious and labor intensive aspects of data mining to agents who in-turn will help people answer questions, make decisions, recognize patterns and in some cases take action for us.
  • 10. 2 1.1. APPROACH The goal of this project was to demonstrate the potential of combining aspects of artificial intelligence with a multimodal interface to deliver a human proxy for conducting diagnostic interviews: a diagnostic agent. Since interviews are not scripted but are context dependent, the application has to be robust enough to select and ask questions by considering the information gained earlier in the conversation. Given the need to ask questions in an “ad-hoc” environment, the application also must be able to manage the dialog progress by generating multimodal interfaces on- the-fly. (11,12) The World Wide Web Consortium (W3C) has proposed a framework for multimodal interface implementation and to the extent possible; the application has been designed to conform to that framework.(3) Significant effort was made to keep the system as simple as possible; dialog management is implemented from the client versus the server. Page control and data flow between the interface and the dialog manager is accomplished on the client via the DOM (Document Object Model). The application also provides feedback to the user or co-user by presenting a graphic representation of the dialog scope and progress. In order to further simplify deployment, system reference data is deployed using simple static text files; no relational database management system is required. The application makes use of PHP (PHP: Hypertext Preprocessor) for server side processing and data retrieval. The Web GUI is DHTML with embedded SALT (Speech Application Language Tags).
  • 11. 3 2. BACKGROUND 2.1. LEARNING and CONVERSATION This report will consider a concept as a collection of primitive attributes and other supporting concepts. The number of supporting concepts and primitive attributes that define a concept can be equated to a level of knowledge about that concept. (15,16) The strength of the associated interconnections between a concept, its attributes and supporting concepts, can be directly related to ones ability to recognize that concept; interconnection strength is, in effect, your level of familiarity and your experience with that concept. Figure 1 Connectionist Concept-Attribute Model Naïve Bayes Network Machine learning can be defined as the process by which a machine acquires knowledge or experience. In the connectionist model it is postulated that networks
  • 12. 4 learn by changing the strengths of their interconnections and/or establishing new interconnections in response to feedback (experience). Figure 1 illustrates the concept – attribute relationship for a Naïve Bayes Network. (10) A tuple (a collection of all the facts related to one entity, often a row in a table) representing a network relationship would include the concept, the attribute, the connection strength and attribute value at a minimum. 2.2. DIAGNOSTIC PROCESS Traditionally, theoretic diagnostic methods have been categorized as one or a combination of three primary reasoning techniques; probabilistic, deterministic and causal.(20) Probabilistic or statistical reasoning makes conclusions based on the statistical correlation between observed and reference attributes.(21) This technique lends itself to mathematical definition (Bayes Theorem ref.) where the diagnosis is promptly computed as soon as the pertinent attributes are assessed. Deterministic reasoning makes conclusions based on the outcome of a series of binary rules organized into logical progressions called decision trees.(18) The order of the rules in the tree is optimized to minimize the number of rules needed to reach an outcome. (19) Causal reasoning makes conclusions based on a comparison between actual conditions and a “causal model” representing normality. Potential cause mechanisms are either validated or excluded after comparison against the model. In actual practice, diagnosis is the process of identifying the cause of a problem or situation by identifying the distinguishing attributes of the problem or situation and
  • 13. 5 then relating those attributes to the distinguishing attributes of potential causes of that problem or situation. When most effective, diagnosis is a combination of the three reasoning techniques described above, but it is accomplished automatically without any formal alignment with, or consideration of, the previously mentioned techniques. (13) Robin C Fraser (1987) in his Clinical Method: A General Practice Approach (14) states: “In actual clinical practice, however, such an approach to clinical problem- solving is rarely used by general practitioners and infrequently used by hospital doctors because it lacks discrimination and has a poor yield in terms of the time and effort expended…..In reality, most clinicians reach diagnosis by a process of hypothetico-deductive reasoning, i.e. by educated guessing and testing”. Not surprisingly, a closer look reveals that diagnosis resembles the process humans unconsciously use to recognize all aspects of their environment; namely, a holistic (pattern) matching process of concepts and attributes. Pattern matching is not about absolute matches but more about establishing the best match for the smallest attribute set. This is the crux of the experience – knowledge relationship. Diagnosis can be relegated to four distinct phases: problem recognition, problem or cause attribute correlation, attribute assessment and feedback. Given a problem or situation, correlation to potential causes results in one or more cause attributes being identified that can be assessed with respect to the related problem attribute in order to prove or disprove the potential cause. On a higher level, diagnosis is the process of evaluating the degree of attribute correlation between a problem and possible causes. The correlation process normally involves several attributes that may vary from deterministic attributes (e.g. always applicable for a condition) to probabilistic
  • 14. 6 attributes (e.g. sometimes applicable to a condition). (17) The level of objectivity or subjectivity inherent in the attribute assessment phase further complicates diagnosis. Figure 2 Diagnosis: A Model. From Clinical Method, Robin C Fraser 1987. 2.2.1. MEDICAL ASSESSMENTS The medical assessment, sometimes referred to as an impression, is the process by which a physician evaluates patient medical history, family history, social environment-demographics and, if applicable, observes current symptoms. The assessment is normally initiated as a result of the patient communicating a complaint. The ultimate goal of the assessment is to reach a diagnosis of the complaint and if warranted propose a course of treatment. The validity of the diagnosis is dependent on both the completeness and accuracy of the patient’s medical and family histories
  • 15. 7 as well as the thoroughness of the physician’s examination and dialogue with the patient. The degree of thoroughness exercised in an examination can be correlated to the degree of relevant experience the physician has with the condition being evaluated. As mentioned earlier, that experience is effectively the doctors’ knowledge of the potential causes of the condition being evaluated and the observable attributes of those causes. Results of the assessment are typically documented on paper or transcribed from voice recordings and later reviewed by the physician for accuracy and completeness. 2.2.2. DEMOGRAPHIC CONSIDERATIONS The impact demographic attributes have on the accuracy of a diagnosis can be significant but recognition of the influence of demographic specific attributes can be difficult for a physician without exposure to large amounts of diagnostic data where the full spectrum of demographic variations are included. Access to data, other than what is gained via personal experience, is normally limited to that offered in medical journals or lectures. 2.3. MULTIMODAL INTERACTION FRAMEWORK The World Wide Web Consortium (W3C) is proposing a framework for multimodal interaction. Simplistically, the multimodal interaction framework is comprised of an Interaction Manager that accepts input from the user via one or multiple modes of input, such as speaking, typing, mouse or gestures. The Interaction Manager acts as liaison between the user and agent functions, session component and system / environment. Output from the
  • 16. 8 agent functions is presented to the user via one or more modalities; most commonly speech and graphics. (1, 2) The approach used to implement the interaction manager varies with the application but by far the most common Web application is the speech enabled HTML form.(4) Figure 3 Multimedia Interaction Framework – Overview 2.3.1. INPUT The input component can be broken down into three sub-components; recognition, interpretation and integration. The recognition component captures and translates user input into a form that is useful to the interpretation component. Speech is converted into text using language and acoustic models along with a speech recognition grammar. Mouse movement and clicks are converted to x-y positions and key presses are converted into text based characters. Other modes of input such as handwriting, DTMF, biometrics and vision would be translated in this component. The interpretation component further processes input from the recognition component primarily in cases where more than one recognition component input value have the same meaning or semantic intention. The integration component integrates the output of interpretation components to yield a synchronized and composite
  • 17. 9 output that is routed to the interaction manager.(6) An example of integration would be synchronizing mouse input and speech input to yield a single user intention. Figure 4 Multimedia Interaction Framework – Input 2.3.2. OUTPUT The output component can also be broken down into three subcomponents; the generation component, the styling component and the rendering component. The generation component uses output from the interaction manger to determine the modality of information presented to the user. In the case of a multimodalWeb page, the generation component would provide both the graphics and speech outputs. The styling component inserts layout information. In the case of speech, the “layout” information might be voice timbre, inflection and volume; in the case of graphics, layout is the familiar position, size, color, etc. The rendering component processes the information provided by the styling components into formats that
  • 18. 10 the user can understand. Speech output is converted into a voice; graphics output is converted into text, controls and other graphic representations. Figure 5 Multimedia Interaction Framework – Output 2.3.3. INTERACTION MANAGEMENT The interaction management component coordinates the flow of interaction and execution between the input and output components. On receipt of input information from the input components, the interaction management component updates application context and information. The updated context and information is then routed to the output components. Several tools may be used to implement the interaction manager. Those tools include HTML, XHTML, Speech Application Language Tags (SALT), C, C++ and X+V (XHTML plus Voice).
  • 19. 11 2.3.4. AGENT FUNCTIONS Agent functions evaluate the interaction state provided by the interaction manager and respond with program flow directives. Business and process logic are from agent functions to the user by way of the interaction manager. 2.3.5. SESSION COMPONENT The session component provides an interface for requesting and releasing session resources for distributed applications where one or more device or user is involved. The session component is also instrumental in managing applications that require persistent and in managing resources in distributed environments. 2.3.6. SYSTEM & ENVIRONMENT The system and environment component will facilitate dynamic adaptation to changes in device capabilities, environmental conditions and user preferences. This component will modify the actions of the interaction manager as the number of devices and user’s changes; both distributed and stand alone implementations must be supported. 2.4. SALT Speech Application Language Tags (SALT) is an XML specification for elements that can be embedded into an application to provide input/output control of speech recognition and speech synthesis. SALT was contributed to the W3C in 2002 by the SALT Forum; an industry group supported by Microsoft, Intel, Cisco, Comverse, Philips and ScanSoft (originally SpeechWorks). Unlike VXML, SALT contains no support structures, interaction flow must be provided by the host language. Elements that enable user speech input are
  • 20. 12 called prompts; elements that provide speech output are called responses. A brief overview of the four top level SALT tags follows: • <listen> for speech input; a speech input object is instantiated in the XML document when this tag encountered. The listen element also contains grammar, binding and recording controls: o <grammar> specifies or references the domain of words and phrases that the system will recognize. The actual grammar can be implemented as either an integral part of the page or it can be contained in a separate file and referenced via a universal resource indicator. o <bind> integrates speech with host application logic by binding spoken input value into the page. o <record> records sounds, speech, etc. • <prompt> for speech output: a speech output object instantiated in the XML document when this tag encountered. The listen element also contains the grammar, binding controls described above. • <dtmf> for touch-tone input • <smex> for platform messaging to enable platform call-control and telephony features. This element also contains the binding control to bind process messages. All four top level elements contain the <param> element. This element is used to extend SALT elements with new functions. SALT as a whole can be extended with new functionality using XML. SALT pages can be viewed as being composed of three primary sections, data, presentation and script. The data section defines the information the user will provide to the
  • 21. 13 application in order to meet sub-goals of the page. The presentation section contains speech prompts, grammars and GUI objects. The script section manages dialog flow and also manipulates the presentation section with various procedures. The modular aspects of a SALT page allow the developer to approach multimodal solutions in much the same way traditional GUI-design is approached. The design goal is achieved using a page-based approach where goal sub-tasks are addressed on a page by page basis. (9) The modular structure of a SALT page supports the Multimedia Interaction Framework described in section 2.3 above.
  • 22. 14 3. PROBLEM / SOLUTION 3.1. Problem The process by which a patient gets resolution of a complaint via common clinical methods consumes significant physician resources and is therefore beyond the reach of many people who do not have access to a doctor or who can not afford the services of a doctor. Furthermore, there are significant inconsistencies with respect to the scope and depth of diagnostic methods employed by one physician when they are compared to other physicians facing similar scenarios. Finally, since often only the outcome of the diagnostic process is documented, the diagnosis and recommended treatment; those methods or thought processes pertaining to the diagnosis are rarely communicated or shared with other physicians. Only the attending physician gains experience from a given diagnostic effort. This project proposes an approach that attempts to address these issues, specifically: 1) Limited access to physicians: Reduce the amount of physician resources required to perform a patient assessment and diagnosis, 2) Inconsistent Diagnostic process: Reduce the inconsistencies in diagnostic efficiency and accuracy between physicians when addressing a specific complaint. 3) Knowledge – Experience disconnect: No viable method for sharing the results of diagnostic efforts to improve the overall level of experience of the physician communityand diagnostic agents.
  • 23. 15 3.1.1. Access to Physicians Access to physicians is expensive, it is time consuming and it may be delayed by weeks or months depending on that physician’s schedule. In all likelihood, the people reading this report can both afford the expense of a visit to the doctor and have the disposable time to devote to the visit, but that is not the case for many others. The challenge then is to extend the doctors presence by “out-sourcing” some of what the doctor traditionally does to other resources that are less expensive and more accessible. Family history, medical history; the forms that we now fill out prior to seeing a physician are simple low-tech examples of a trend in that direction. The problem with forms in general is that they require a certain level of reading and writing skill or vocabulary that I might not possess. Most people are more comfortable just talking to someone and answering a few relevant questions. 3.1.2. Inconsistent Diagnostic Process Accurate diagnosis relies on the ability of the physician to match attribute patterns typical of a condition or disease to those attributes exhibited by the patient. Effective and efficient diagnosis relies on the ability of the physician to focus on the most significant attributes and not be distracted by applicable but less significant attributes. The physician’s ability to diagnose is framed by his experience and on-hand reference or lack thereof. Inconsistencies between physicians in both accuracy and efficiency arise when their experience and knowledge vary. This is especially evident when a physician is exposed to a
  • 24. 16 patient demographic that is different than the demographics on which his training or previous practice was based. The challenge here is to make available a system that employs a consistent approach to applying experience to the diagnostic process; more specifically, provide a system uses greedy pattern matching as an alternative to more common clinical methods (i.e. educated guessing and testing). 3.1.3. Knowledge – Experience Disconnect Since little of the actual thought process or methods employed by a physician during diagnostic efforts are documented, the only one who stands to benefit in terms of knowledge or experience from the knowledge gained during diagnostic interviews is the physician conducting the interview. Even then, the passage of time will erode the physician’s recollection and that experience will be lost to everyone. The challenge then is to create a framework that can provide both a means of saving diagnostic decision processes and also provide a means of distilling those decisions and sharing them as composite experience to physicians and their (our) agents. 3.2. SOLUTION Design and implement a system that will extend the physicians presence by performing diagnostic interviews based on a consistent and logical evaluation of causal attributes. The application will conduct the interview as a spoken question and answer session between the patient and the machine with a medical attendant present. Create composite experience. Data will be structured such that it can be
  • 25. 17 updated to reflect the experience gained from each successful diagnosis performed by the system. This is accomplished by adding new concept – attribute pairs or by refining concept – attribute pair connection strength as experience is gained 3.2.1. Physician Task Outsourcing The application will perform those tasks that deal with data collection and it will also perform an initial diagnosis. Just as is the case during interviews conducted by physicians; application generated questions are phrased using the layman’s equivalent of the medical attribute being evaluated and then spoken to the patient. Each successive question is based on the answers given to those questions asked earlier in the interview. Both the direction of the diagnostic conversation and the ultimate diagnosis are dependent on the patient’s answers. Consistency is achieved at the diagnostic level not by the specific questions asked, but by the relevance of the questions asked. 3.2.2. Composite Experience Composite experience is the ability of the system to leverage case history and diagnostic methods across physician and practice divides. Each time a diagnosis is made the system’s data is structured so that it can be updated to reflect the impact on the system’s experience. The strength of the connection between the diagnosis and the complaint can be adjusted based on either positive or negative feedback. Attributes that were used to make the diagnosis can also be updated
  • 26. 18 with respect to the strength of their connection to the diagnosis. Thus experience becomes equivalent to the collective levels of connectivity between a condition or disease and those attributes that define it. The experience of the system should grow in proportion to its level of use just as our experience grows with our involvement in a process or endeavor. To better reflect the impact of time on experience, it is proposed that the connection strength of certain attributes should decay over time such that changes in disease patterns, especially with respect to demographics, can be detected more readily. This is discussed under future work in more detail.
  • 27. 19 4. DEVELOPMENT 4.1. Requirements The high level requirements for the system are as follows: • Keep it simple • Implement the W3C multimodal framework to the extent possible • Provide a generic solution, keep it knowledge domain independent. • Provide a simple and portable data representation of the concept-attribute relationship that includes connection strength and attribute value domain. • Provide a simple and portable data representation of standard attribute properties with respect to both graphic and speech output generation. • Provide a method for examining concept attributes in a logical context aware sequence using a minimum number of questions but yielding a high confidence concept match. • Provide a dynamic multimodal interface for gathering user input via both conversation and pointing device. • Provide a graphic representation ofthe progress of the input gathering effort with respect to pattern matching. Distinction between positive and negative evaluations must be intuitive. • Provide a tool to view dialog manager status. 4.2. Functional Overview The application is structured as three distinct layers; client, business and data. The client layer implements the input aspects of the multimodal framework and also hosts the dialog manager and agent functions. At the client layer, tasks are divided between
  • 28. 20 the parent page and the multimodal page. The parent page hosts the dialog manager and agent functions, the multimodal page implements the multimodal framework input functions. The business layer implements the output aspects of the multimodal framework, specifically page generation, styling and rendering. Data connectivity is also supported at this layer. The data layer is comprised of two data files. One defines the knowledge network and the other defines attribute properties that are used by the page generator for ad- hoc multimodal page creation. Figure 6, below, illustrates the application component relationships.
  • 29. 21 $ WWULEXWH . H Figure 6 Component Diagram 4.2.1. Dynamic Page Generation The multimodal page generator constructs HTML+SALT pages based on directives from the DDM. The basic tasks performed by the page generator are inline grammar generation, HTML / SALT generation and input validation. For the purposes of this discussion page generator tasks are viewed as belonging to one of two areas; output component generation or input component generation.
  • 30. 22 Output components include grammar and HTML / SALT generation while input addresses input validation. The generated page supports two distinct types of media; what the user can say and what the user can see, grammar and HTML respectively. Grammar generation and HTML generation with embedded SALT is a nuclear operation. The dialog manager passes an attribute key to the page generator. The page generator retrieves an attribute query set corresponding to the attribute key from the server. The query set contains the attribute key, a visual queue, an audio queue, an input type and valid input values. Using the query set as the argument, a Web page is generated containing embedded SALT, an HTML form and an inline grammar. Standard input handling functions and slot handling are also inserted into the page. 4.2.1.1. Embedded SALT The Multimodal page generator embeds SALT tags into the generated page based on the propertycontext of the focus attribute. The <listen>,<prompt>, <grammar> and <bind> tags are embedded with the appropriate arguments. 4.2.1.2. HTML Form The multimodal page generator creates an ad-hoc HTML page containing a <form> tag and <input> tags. The <input> tag type, label and value is dictated by the property context of the focus attribute. Invisible <input> tags are created to serve as session control flags.
  • 31. 23 4.2.1.3. Dynamic Grammar Generation An inline grammar is generated using arguments retrieved from attribute properties data set relative to the attribute being evaluated. The data set is used to define the domain of acceptable responses within the context of the prompt. Due to the specificity of the questions asked by this application, the data domain is primarily one that evaluates a positive or negative response. 4.2.1.4. Page Control JavaScript functions are embedded by the page generator to provide page control. Page results are passed back to the dialog manager along with a process request using the Document Object Model. 4.2.2. Diagnostic Dialog Manager The diagnostic dialog manager organizes and evaluates concepts and attributes related to some area domain or in this case a provisional list of potential problems / diagnostic possibilities. Acting on a key word or words in the user’s response, the DDM initializes the diagnostic interview by importing all relevant concepts and their related attributes for evaluation. The diagnostic process generally used by the application is based on selecting the next most relevant question within the context of the conversation’s progress up to that moment.
  • 32. 24 Figure 7 Sequence Diagram Dialog Process Flow 1. System retrieves the area of interest domain 2. System constructs a Web page with embedded SALT and an inline grammar based on the area domain key words. 3. The user identifies the area of interest by
  • 33. 25 uttering a phrase that contains the area of interest keyword(s) or by using the mouse to pick an entry from the list of values presented by the HTML form. 4. The diagnostic dialog manager (DDM) retrieves the associated domain of concept – attribute pairs and organizes them based on concept – attribute pair connector strength. 5. The DDM selects the most relevant concept – attribute pair and calls the dynamic page generator with a reference to the selected attribute. 6. The dynamic page generator retrieves attribute arguments from the data source and constructs a multimodal page representing the first question of the diagnostic conversation. The question pertains to the attribute with the highest attribute strength related to the class with the highest attribute strength. 4.2.3. Diagnostic Decision Process Entry into the decision process occurs at the point when the user answers the first question posed by the dialog manager.
  • 34. 26 Figure 8 Decision Process Decision Process 7. The user utters a response to the question or selects their response from the Web page list of values.
  • 35. 27 8. The user’s answer to the question is passed back to the DDM. The DDM updates the value of all class- attribute pairs with the same attribute based on that response. 9. The DDM performs a reconciliation pass through the attribute domain to produce a pattern matching score grouped and aggregated by concept based on updated attribute values. 10. Depending on the conversation base, a set number of questions must be answered prior to transitioning the dialog manager from a lockstep progression to a pattern matching progression. In either case, the DDM selects the next concept – attribute pair and hands it off to the page generator. 11. Steps 5 through 8 are repeated until either a pattern matching score equals or exceeds the set confidence threshold or all questions are asked. As discussed earlier, relevancy in a pattern matching or diagnostic process is based on the magnitude of the concept – attribute pair connector. When several concepts are being evaluated as a potential match for an initial condition, relevancy must be established at two or more levels. In other words, the most relevant question would be
  • 36. 28 one that tests the most relevant attribute of the most relevant diagnosis for the expressed condition. Each time an answer is provided to a question, diagnosis relevancy may change depending on the context of that answer. If another diagnosis becomes more relevant, the system immediately shifts its line of questioning to address attributes that pertain to the most relevant diagnosis. The process of questioning and evaluation continues until either the minimum threshold for percent confidence is exceeded or until all attributes for all potential diagnosis have been characterized. 4.2.4. Data Structure In order to eliminate the need for a database server and preserve application simplicity, data is stored as tab delimited text files. The diagnostic data structure supports both the “experience” aspect of the diagnostic process, the visual aspects of the speech interface and the linguistic and semantic aspects of an inline grammar. The Concept – Attribute data structure is comprised of tuples with five properties; Type, Concept Description, Attribute Description, Connector Weight and Positive Response. The Concept – Attribute data structure represents a simple Naïve Bayes Network.
  • 37. 29 Component Description Examples Type Entry type Concept OR Attribute c a Concept Description If type=”c” Area Description; if type=”a” Concept Description elbow medial_ epicondylitis Attribute Description If type=”c” Concept Description, if type =”a” Attribute Description medial_ epicondylitis flexation_pain Connector Weight Connection magnitude; relevancy 456 200 Positive Response The response value that satisfies a true state for the attribute. (not used for type=”c” Yes Table 1 Concept – Attribute Pairs The Attribute Argument data structure defines the verbal and visual queues needed to communicate a question about a specific attribute. Information in the attribute argument data structure is retrieved by the page generator. Each tuple in the data structure contains five properties; attribute, visual queue, speech queue, input type and input domain. The attribute property relates the two tables.
  • 38. 30 Component Description Examples Attribute Attribute Key – sets relationship to concept- attribute pairs. flexation_pain Visual Queue FORM label Is the pain worse with resisted flexation? Speech Queue SALT prompt Is the pain worse when you pull on something? Input Type Input Type YN Input Domain Allowed Input Value(s) Yes,No,Yep,Nop Table 2 Attribute Arguments
  • 39. 31 5. IMPLEMENTATION The application is implemented as a parent Web page with an inline frame supporting a dynamic multimodal page. Figure 9 represents a high level view of the application. As described in previous chapters, the parent Web page hosts the dialog manager and the diagnostic agent. The parent page yields control to the multimodal page each time a multimodal prompt is generated. After the user responds, functions embedded in the multimodal page generate a semantic interpretation of the users spoken or pointer generated response. Page control functions in the multimodal page use the DOM to transfer the semantic interpretation to the parent Web page. The multimodal page then issues a reset command to the parent page. The reset method prompts the dialog manager to pass the semantic interpretation to the diagnostic agent for evaluation and network updates. At the conclusion of the updates, the diagnostic agent passes the next attribute to be evaluated to the dialog manager, which, in-turn, updates the source of the inline frame with a call to the dynamic page generator using the attribute as the argument. A few scenarios are described below to provide a practical and more detailed understanding of the system in operation.
  • 40. 32 Figure 9 Basic Application Diagram Figure 10 below shows the interface after having selected the area “elbow”. Note that all potential diagnosis are listed in the page header and illustrated in the progress table on the left hand side of the page. The progress table is organized by Concept and Attribute in descending connector strength. The initial visual prompt is “Is the pain focal”, and the corresponding layman’s equivalent spoken prompt is “Is your pain only in a specific area?” At this point the system is waiting for either a speech or a pointer generated reply from the user. Control of the application has been momentarily transferred to the multimodal page. When the user provides a response the system will generate a semantic interpretation of the response. The interpreted response and application control will be returned to the parent page dialog manager. Figure 11 below illustrates the condition of the progress panel after several questions have been asked. Green entries represent attributes that have evaluated as a positive matchto a potential diagnosis. Red entries are attributes that have evaluated as a negative match to a potential diagnosis and the yellow entry is the attribute that the system is currently requesting user input on. The
  • 41. 33 illustration adjacent to the progress panel is a pop-up window that shows internal status of the dialog manager. In this case you can see the percentage positive weight is highest for lateral epicondylitis (tennis elbow); this indicates that the system will continue evaluating attributes for that diagnosis until the required minimum confidence threshold has been achieved. The pop-up is used primarily during application tuning and is not intended to be part of the user’s toolset. Figure 10 Example 1 – Tennis elbow The next example below illustrates the results of a diagnostic interview where the system has transitioned from the most common diagnosis for elbow conditions, lateral epicondylitis, to the least common diagnosis.
  • 42. 34 Figure 11 Example 2: Diagnosis Transition The transition occurred when the patient indicated that there was local swelling. This caused the percent positive weight to shift to a diagnosis of arthritis. The system now pursues confirmation of this diagnosis by confirming additional attributes through questioning. Note the next question pertains to both bursitis and arthritis; since both share the same attribute and value, the diagnostic focus is unlikely to shift based on this question. Figure 13 illustrates the results status of the system following evaluation of the next two questions. At this point, the system has recognized that the minimum confidence threshold has been reached and is proposing a diagnosis. Note that a diagnosis of bursitis was possible but based on the relative weighting of the attributes a diagnosis of arthritis is more likely.
  • 43. 35 Figure 12 Example 3: Diagnosis A snapshot of the diagnostic session is saved for future reference and system updates. The intention is to document the diagnostic decision process in a format that can be used by a feedback mechanism. (Ref. Future Work). The dialog manager calls the dynamic page generator with the diagnosis and the associated confidence factor. The page generator creates a multimodal page with a diagnosis prompt and asks the user if there are other areas or conditions that need to be evaluated. If the user response is “Yes”, the system generates a call to update the window.location object for the parent object with the referrer to the inline frame, which is in effect recalling the parent Web page.
  • 44. 36 IF (ANS=="YES") { WINDOW.LOCATION=DOCUMENT.REFERRER } ELSE { WINDOW.LOCATION="HTTP://WWW.AUBURN.EDU/~BAKERBT"; } Table 3 DOM Window.location object
  • 45. 37 6. PRACTICAL APPLICATIONS The concepts used in the medical diagnostic application described above are applicable to several other areas where there is a need to extend the presence of humans who possess knowledge and experience in focused areas. 6.1. Clinical trainer A large part of a physicians training is hands-on clinical involvement. This is where the physician learns to apply knowledge gained in medical school to real world scenarios. The effectiveness of traditional clinical training is challenged by local demographics, the number of patients that can realistically be seen and the limited exposure the physician has to a large spectrum of conditions or illnesses. Using the diagnostic application as a guide, the physician can augment experience in an area by performing drills with the application for a given condition and demographic segment. 6.2. Autonomous planning agent Heavy industry installations spend millions of dollars each year on asset maintenance and the work controls process. A medium size plant often has over a dozen technical workers dedicated to planning work. Their primary job is to provide repair plans so equipment can be fixed and returned to service with a minimum impact on production. An important, and often difficult, aspect of developing a repair plan is to diagnose the cause of the reported problem. The challenge in many cases is that the problem is not documented very well and the planner rarely has occasion to talk to the person reporting the problem. A solution to this would be to allow users to report the problem to a planning agent. The planning agent could ask relevant questions at the time the problem
  • 46. 38 is reported; this approach would both provide a more detailed problem description and have the potential to establish the cause of the problem. Once the cause has been identified the system could automatically generate a work plan.
  • 47. 39 7. FUTURE WORK This application is a work in progress or proof of concept and as such is not intended to be implemented as a complete solution. Figure 14 illustrates the mature architecture with the current implementation in light blue. There are three general areas of additional functionality needed to move this project to a production status. The first is a method for documenting patient history, including demographic attributes, so that the diagnostic agent can “prune” potential diagnosis based on that data. Although not essential to obtaining a diagnosis, the ability to consider patient history and demographics can reduce the number of attributes that need to be evaluated to arrive at a high confidence diagnosis. The system would use demographic membership and patient histories to, in effect, prune the knowledge network. The second is a method for feeding back the results of diagnostic interviews into the system. This would occur after a follow-up visit with the patient confirms that the diagnosis was correct. The set of attributes evaluated to produce the diagnosis would be fed back into the knowledge network by incrementing the connection strength for those attributes. This practice allows the system to rapidly gain experience. The addition of an “aging” rate on certain demographic attributes will make the system more responsive to changes in social and cultural shifts. The third enhancement is the creation of a multimodal learning interface where humans can communicate via speech their experience to the machine in terms of concepts, attributes and attribute values. This will be my next research area.
  • 48. 40 ' QDPLF' LDJQRVWLF0 XOWLP RGDO,QWHUIDFH . QRZOHGJH%DVH 8VHU ' HPRJUDSKLF 3HUVSHFWLYH 3DWLHQW +LVWRU Z ' HPRJUDSKLF $WWULEXWHV ' LDJQRVWLF 6HVVLRQ ' LDJQRVWLF$WWULEXWHV ' LDJQRVLV ) ROORZ 8 S GLDJQRVLV FRQILUPDWLRQ 8VHU 0 XOWLPRGDOWHDFKLQJ VHVVLRQ Figure 13 Future work
  • 49. 41 8. REFERERNCES 1. Larson, James A. “How to Converse with a Virtual Agent by Speaking and Listening Using Standard W3C Languages”. Retrieved May 21, 2005 from http://www.larson-tech.com/Writings/VR.pdf 2. Larson, James A. “Standard Languages for Developing Multimodal Applications”. Retrieved May 21, 2005 from http://www.larson- tech.com/Writings/multimodal.pdf 3. W3C Multimodal Interaction Framework. Retrieved May 21, 2005 from http://www.w3.org/TR/mmi-framework/ 4. W3C Multimodal Architecture and Interfaces. Retrieved May 21, 2005 from http://www.w3.org/TR/mmi-arch/ 5. Introduction to Cognitive Science Website: http://www.unc.edu/depts/cogsci/123/connectionist3.htm 6. Puncher, Michael., Kepesi, Marian. “Multimodal Mobile Robot Control using Speech ApplicationLanguage Tags”. Retrieved May 21, 2005 from http://userver.ftw.at/~pucher/papers/mmrobot1.pdf 7. Salces, Fausto J., Llewellyn-Jones, David., Merabti, Madjid. “Multimodal Interfaces in a Ubiquitous Computing Environment”. Retrieved May 21, 2005 from http://www.bath.ac.uk/comp-sci/hci/UK- Ubinet%20Files/Llewellyn-Jones/FSainz-3rdUbinet.pdf 8. Villasenor-Pineda, L., Montes-y-Gomez, M., Caelen, J.. “A Modal Logic Framework for Human-Computer Spoken Interaction”. Retrieved May 21, 2005 from http://ccc.inaoep.mx/~mmontesg/publicaciones/2004/LogicFramework- CicLing04.pdf 9. Wang, Kuansa. “SALT: An XML Application for Web-based Multimodal Dialog Management”. , 2nd Workshop on NLP and XML(NLPXML-2002) Taipei, September 1, 2002 (The 19th International Conference on Computational Linguistics). Retrieved May 21, 2005 from http://acl.ldc.upenn.edu/W/W02/W02-1715.pdf
  • 50. 42 10. Keogh, Eamonn J., Pazzani, Micheal J.. “Learning Augmented Bayesian Classifiers: A comparison of Distribution-based and Classification-based Approaches”. Retrieved May 21, 2005 from http://www.ics.uci.edu/~pazzani/Publications/EamonnAIStats.pdf 11. Reitter, David., Panttaja, Erin Marie., Cummins, Fred. “UI on the Fly: Generating a Multimodal User Interface”. Retrieved May 21, 2005 from http://www.medialabeurope.org/research/library/reitter- etal_uifly_2004.pdf 12. Panttaja, Erin Marie., Reitter, David., Cummins, Fred. “The Evaluation of Adaptable Multimodal System Outputs”. Retrieved May 21, 2005 from http://www.reitter-it-media.de/compling/papers/panttaja- etal_evaluation_2004.pdf 13. Tomassi, Paul. “Logic and Diagnostic”. Retrieved May 21, 2005 from http://www.ul.ie/~philos/vol3/gnostic.html 14. Fraser, Robin C. “Clinical Method: General Practice Approach”. Butterworth and Co. London. 1987 15. Aydede, Murat, “The Language of Thought Hypothesis”, The Stanford Encyclopedia of Philoshophy (Fall 2004 Edition), Edward N. Zalta (ed.), Retrieved from http://plato.stanford.edu/archives/fall2004/entries/logic-ai/ 16. Dietterich, Thomas G. “Machine-Learning Research: Four Current Directions” The American Association for Artificial Intelligence. Retrieved on May 21, 2005 from http://www.aaai.org/Library/Magazine/Vol18/18-04/Papers/AIMag18-04- 010.pdf 17. Ragan, Brian., Zhu, Weimo., Kang, Minsoo., Flegel, Melinida. “Construction of an Ankle Injury Diagnostic Decision Tree”. Retrieved May 21, 2005 from http://www.kines.uiuc.edu/labWebpages/Kinesmetrics/Presentations/Data %20mining_02/Web-pdf/DMfinal_3.pdf 18. Moret, Bernard M. E. “Decision Trees and Diagrams” ACM Comput. Surv. Vol 14-4, ACM Press, New York, 1982
  • 51. 43 19. Eardley, David D.., Aronsky, Dominik., Chapman, Wendy W., Haug, Peter J.. “Using Decision Tree Classifiers to Confirm Pneumonia Diagnosis” . Retrieved on May 21, 2005 from http://www.amia.org/pubs/symposia/D200520.PDF 20. Kahn, Charles E. Jr. M.D., Haddawy, Peter, Ph.D., “Optimizing Diagnositc and Therapeutic Strategies using Decision-Theoretic Planning: Principles and Applications”. Retrieved May 21, 2005 from http://www.mcw.edu/midas/papers/Medinfo-1995.pdf 21. Druzdzel Marek J., Diez, Francisso J. “Combining Knowledge from Different Sources in Causal Probalistic Models”. Journal of Machine Learning Research 4(2003) 295-316, July 2003