1. Speech Matrix® Systems
Automated Speech Recognition
for
Self-reports in Health Research
Esther Levin Alex Levin,
Spacegate, Inc Spacegate, Inc.
City College of New York alex@spacegate.com
esther@spacegate.com
1
3. Background
Health Data Collection
Traditional methods
Paper based diaries
Observations
Modern methods – Electronic Data Capture (EDC)
PDA
Web based
IVR (touch–tone)
Next Step - EDC
ASR (automated speech recognition)
Confidential – Spacegate, Inc. 3
4. Electronic Data Capture –
Goals & Challenges
Real data in real-time
Data validation
Regulatory compliance
Increase clinical trial efficiency
Reduce time-to-market
Cost
Patient/Subject burden
Confidential – Spacegate, Inc. 4
5. Automated Speech
Technology – Science & Art
Voice to Data - voice data entry via automated
natural dialogue
Automated Dialogue – Voice Forms
Voice Interface Design
Real-time Monitoring & Reporting
Privacy & Security
Regulatory Compliance
Confidential – Spacegate, Inc. 5
6. ASR System for Data Collection
Confidential – Spacegate, Inc. 6
7. Why Speech?
Speech is a natural modality of interactions
Phone is user friendly and ubiquitous and no special training for its
use is required.
Dynamic Dialogue Flow
personalization of both content and style based on the profile and
history.
Real time feedback and monitoring
real-time reports of captured data
Automated Compliance monitoring
Flexible and extensive scheduling
Inbound/outbound call sessions
the calls can be initiated by a system following a prescribed protocol
Overall ASR based system offers an extensive and practical tool to
facilitate efficient and convenient real-time, two-way
communications.
Confidential – Spacegate, Inc. 7
8. Speech Matrix®-VDC™ Systems
Extensive data collection system for health, clinical, life
science and behavioral research.
Another branch of EDC
Real-time data capture in a participant native
environment
Confidential – Spacegate, Inc. 8
9. Applications
Pain Monitoring Diary (PMD™) prototype application - based on
questions in standard pain questioners (BPI)
Drug use Diary (VDMD™) : Prototype application - based on
questions in standard Drug Use Diary (DR. Linda Sobell)
Implements Dynamic Questionnaires
Interactivity & Communications
Reporting and Management
Confidential – Spacegate, Inc. 9
10. Dialog Design
Task characteristics:
Need to guarantee data validity, accuracy and integrity, taking into account
speech recognition errors
improve the overall accuracy using dialog actions such as re-prompts,
confirmations, error handling, and, if necessary, recording and flagging the
unrecognized utterances for later transcription
The system should accommodate both novice and experienced callers
enough information and help to guarantee question understanding and
successful session completion for novice.
short and effective call flow for experienced caller
Subjects identify themselves in the beginning of the each session.
Opportunity to use the knowledge accumulated across sessions for
personalization.
Subjects may receive some training on the use of the spoken dialog system
during the enrollment session.
Dialog Design Issues:
controlling the captured data accuracy.
adaptive level of user support
Confidential – Spacegate, Inc. 10
11. Controlling the Accuracy of Data Capture:
ASR 101
Speech Recognition Grammar Design
Example: yes/no grammar {yes, no}
Caller utterance is matched compared to the
possibilities described by the grammar
The output of ASR is the best matching ‘sentence’,
and a score
If the score is too low => rejection
Out-of-vocabulary utterances cannot be
recognized correctly
Design tradeoff: minimize out-of-vocabulary
and minimize grammar size
Confidential – Spacegate, Inc. 11
12. Controlling the Accuracy of Data Capture.
Improved rejection mechanisms to deal with out-of-vocabulary utterances .
System prompt: Was that your left shoulder?
User: no, left elbow
System prompt: I didn’t get that. Was that your left shoulder? Please say ‘yes’ or ‘no’ .
Reliable confirmation recognition.
Using confirmations as the way to control the larger grammar’s accuracy.
Using recording to capture the out-of-grammar answers and problematic
user inputs.
System prompt: “Was that your left shoulder?”
User: “No”
System prompt: “Sorry about that. Let’s try it this way. Please choose carefully a body part from the
following list that best describes the location of your pain, and just say it. If none of the locations match,
please say ‘none of those’. Here is the list: abdomen <pause>, ankles …”
User (barges in) ”none of those”
System prompt: “Ok. Let me just record your answer. Please describe the location of your pain in your
own words.”
User: <……>
System prompt (after recording is finished): “Thanks, I got that. Let’s move on.”
recorded utterance is captured and flagged as “transcription is needed” for later processing.
same mechanism of fall-back to recording instead of recognition is used after several
repeated recognition failures.
Confidential – Spacegate, Inc. 12
13. Adaptive Level of User Support-I.
Prompt Design.
“Where does it hurt? <pause>. For example, your head
stomach or back? <pause>. Remember, if you don’t know how
to answer this question, just say ‘I need help’ “.
Context sensitive help .
help information describes and clarifies the current question,
Provides examples of possible answers
Example: help for the “where does it hurt” question: “Okay.
Here is the help information. At this point I need to find out the
part of your body that hurts the most. Please choose carefully a
body part from the following list that best describes the location
of your pain, and just say it. If none of them matches, please say
‘none of those’. Here is the list: abdomen <pause>, ankles
<pause>, back <pause>,...( list continues) …, toes <pause>.
Which one is it?"
Confidential – Spacegate, Inc. 13
14. Adaptive Level of User Support-II
Detecting speech recognition failures .
The re-prompts are designed as an escalating list, providing increasingly
more information and progressively constraining the caller.
“Where does it hurt? <pause> For example, your head, stomach or
back? <pause>. Remember, if you don’t know how to answer this
question, just say ‘I need help“.,
“I didn’t get that. Please tell me the part of your body that hurts the
most, Remember, you could always say ‘I need help’ ”,
Detecting Misunderstandings .
the user says “no” to a confirmation question as in:
System prompt: Was that your left shoulder?
User: No.
System prompt: Sorry about that. Let’s try it this way. Please choose
carefully a body part from the following list that best describes the
location of your pain, and just say it. If none of them matches, please
say ‘none of those’. Here is the list: abdomen <pause>, … (list
continues). Which one is it?
Confidential – Spacegate, Inc. 14
15. Adaptive Level of
User Support-III
Dialog Personalization.
Taking advantage of a
larger inter-session
context by designing
two types data
collection sessions:
normal and follow up.
Both content and style
are different.
Reminding prompts:
“Is the pain still in your
left shoulder?”.
Confidential – Spacegate, Inc. 15
16. NORMAL SESSION
System : Your pin please.
Caller: **** Caller : it feels sharp
System: Ok, *****, lets begin. Are you in System : Was that ‘sharp’?
pain now? Caller: yeah
Caller: yes System: Is the pain constant or does it come
System: Where does it hurt?<pause> For and go?
example, your head, stom... Caller: comes and goes
Caller: my left shoulder System: Do you have any other symptoms?
System: Was that your left shoulder? Caller: yeah
Caller: yes System: Please tell me your most disturbing
System: Please rate your pain on a scale of symptom. For...
zero to ten, with zero being no pain and Caller: I have fever
ten being the worst possible pain System: I heard ‘fever’. Is that right?
Caller: it is seven Caller: yes
System: as that 7?<pause> Please say yes System: Would you like to leave a message
or no. for the staff? …
Caller: yeah Caller: no
System: How does the pain feel? System: Thanks for your cooperation. I've
Caller : I don't know got all the information that I need. I'll call
System : Okay, Here is the help information. you again in 4 hours. Goodbye.
At this point I need to find out the type of
your pain. Please choose carefully a
word from the following list that best
describes your pain and just say it. If
nothing matches, please say ‘none of
those’. Here is the list: aching, <pause>
burning, <pause> dull…
Confidential – Spacegate, Inc. 16
17. FOLLOW-UP SESSION
System : Your pin please. System: Is the pain still in your left shoulder?
Caller: **** Caller: yes
System: Hi, Esther, This is your follow-up call … System: Does the pain still feels sharp?
Are you still having pain? Caller: no
Caller: yeh
System: How does the pain feel?
System: Did you take your pain medication today?
Caller: yes Caller : It’s dull
System: What time did you take the medication? System : Was that ‘dull’’?
Caller : Two hours ago Caller: yeah
System: Was that around 9:45 am? System: Is the pain constant or does it come
Caller: yeh and go?
System: Did the medicine diminish your pain?... Caller: It’s constant
Please
System: Do you have any other symptoms?
Caller: yeh
System: Please rate your pain on a scale of zero Caller: no
to ten, with zero being no pain and ten being the System: Would you like to leave a message
worst possible pain. for the staff? …
Caller: four Caller: no
System: Was that 4? System: Thanks for your cooperation. I've
Caller: yeah got all the information that I need. I'll call
you again tomorrow. Goodbye.
Confidential – Spacegate, Inc. 17
18. Normal session report
Captured Value Confirmed (yes/no) Confidence Score
Pin **** no 66
Are you in pain? yes no 80
Pain Location left shoulder yes 86
Pain Intensity 7 yes 88
Pain Type sharp yes 88
Pain constant? pain comes and goes no 47
Symptoms fever yes 86
Message none no 78
Confidential – Spacegate, Inc. 18
19. Follow-up session report
Captured Value Confirmed Confidence Score
Pin ***** no 74
In pain? yes no 85
Medication taken? yes no 76
Medication time 9:45 am yes 69
Medication helped? yes no 75
Pain Rating 4 yes 87
Pain Location left shoulder yes 87
Pain Type dull yes 86
Pain constant? constant no 54
Symptoms none yes 82
Message none no 84
Confidential – Spacegate, Inc. 19
20. Usability Test
24 subjects
118 dialog sessions
113 completed
5 hang-ups
42 follow-up
1766 dialog turns
98% automatic data capture – the rest
flagged for transcription
Confidential – Spacegate, Inc. 20
21. Results
Session duration (sec) 105.6(46.78)
Number of dialog units per session 7.85 (2.6)
Duration of dialog unit (sec) 13.46 (4.54)
Dialog turns per dialog unit 1.88 (0.46)
Percentage of task oriented turns 80% (16)
Percentage of barged-in prompts
66% (13)
Time duration of a dialog turn (sec) 7.19 (1.10)
Time duration of a dialog turn when barge-in was 10.63(1.5)
disabled
Confidential – Spacegate, Inc. 21
22. Summary
ASR & Spoken Dialog Methodology for data
capture can provide:
Additional real-time data collection tool
Flexible protocol design
Improves data validation and compliance
Centralized collection and monitoring
Telephone as ubiquitous device
System design needs to take into account the
specificities of the task and the limitations of the
technology
Flexible level of user support
Controlled accuracy of the captured data
Confidential – Spacegate, Inc. 22
Editor's Notes
Use of questionnaires is an essential method of data collection. Very often the research or study finding based significantly or completely on questionnaire responses. While it is an art of designing valid questioners, the tools and methods of data collection are not less important and often can influence the research outcome. Life and Behavioral Science industries are constantly looking to address the challenge of reliable, real-time data collection, compliance and validation. Some offer use of PDA or Web, but they are also facing with on another array of issues and challenges. Traditional data collection methods vary from paper-based diaries and reports, to video/audio recordings, to human observation. However, doubts have been cast on the validity of the data collected through paper-based methods of self-report, notably from a recent study that demonstrated that most of the paper diary entries by patients (79%) were falsified (Stone & Shiffman). Recently, electronic data collection techniques, utilizing hand-held computer devices (PDAs), the Internet, or cellular phones, have been introduced. These methodologies enable collection of meta-data about the respondent's compliance and use of such data to measure and improve compliance. The amazing flexibility and open nature of the Internet has created expectations from all technologies for innovative, flexible and revenue generating features. The ultimate goal of electronic data capture (EDC) is multi-modal access, retrieval and collection of information via all current and future communication channels and devices, which is within our reach. The information data should retain its natural format, but the means to access it should be flexible. While we continue to explore Internet capabilities, a new generation of technology is opening new possibilities for data access.
Some of the biggest challenges for pharmaceutical companies are the need to increase clinical trial efficiency, decrease time-to-market, and ensure data validation and regulatory compliance. Clinical trials of new products take considerable time and expense, and data collection and processing is generally considered a major bottleneck. Today, clinical trials professionals use several methods to capture and record research data. Paper dominates, but electronic data capture (EDC) tools, ranging from remote data entry (RDE), scanning or faxing, to the new generation of Web, PDA and IVR techniques have also been employed, each with its own issues.
Automated speech technologies and services are clearly expanding their reach from pure single utterance to more complex conversational dialogue-based information distribution. The search for ultimate communication channels that have the reliability of the telephone and the richness of the Internet will continue for the near future, but today, the advent of automated speech recognition, voice web, near human synthetic voices (TTS), and VXML/SALT standards opens new possibilities for design and deployment of practical applications. Implementation of successful speech application involves multidiscipline approach encompassed established speech application design principles and application domain unique requirements and limitations. The ultimate goal of any application design is its usefulness and simplicity of use of target audiences. Dialogue driven speech systems are widely implemented in various fields. Speech Matrix® applications are iterative and interactive. A natural dialogue is designed and presented with various degrees of complexity and depth to address issues like disambiguation, confirmation, verification and dynamic dialogue flow. Administrative functions and scripts are designed to comply with research protocols, such as event or time contingent calls, random call flow scripts and random/scheduled call sessions. The flexible dialogue design can put emphasis on a specific subject of research, better than other available methods of data collection. A caller ID is used for identification and security. Voice verification techniques can add additional levels of security and authentication. Every utterance of the call session is recorded as a sound file and can be used as an audit trail. This also addresses regulatory issues for data preservation, authentication, validation and compliance.
Speech Matrix® is an innovative clinical research voice data collection system based on natural speech and choice of expression. People are accustomed to using telephone devices and hence, data input through the telephone has the inherent advantage of familiarity, simplicity and ease of access. Speech Matrix® allows data collection in real-time in a patient/participant native environment. Speech Matrix® Voice Data Collection applications are receiving an enthusiastic reception from the clinical and behavioral research communities, including the National Institute of Health.
. The advantages of using this technology are: - Speech is a natural modality of interactions for humans, and the input device – the phone – is user friendly and ubiquitous and no special training for its use is required (as opposed to PDA or computers) - Compliance is monitored automatically: the calls can be initiated by a system following a prescribed protocol, and the system can report about any non-compliance to trial administrator in real time. - Spoken automated dialog reaches much beyond voice-enabling static paper questioners: possible answers are not limited by number of check-boxes to fit on a piece of paper; question selection can be done dynamically based on previous answers; personalization of both content and style based on the patient’s history is possible. - The ability to transform the captured data into real-time reports, and further interface the information with other clinical or back-office systems and databases provides an unparalleled opportunity to enhance patient feedback and monitoring. Overall ASR based system offers the caregiver an extensive and practical tool to facilitate efficient and convenient patient communications, which saves time while increasing quality of care.
Speech Matrix® is an innovative clinical research voice data collection system based on natural speech and choice of expression. People are accustomed to using telephone devices and hence, data input through the telephone has the inherent advantage of familiarity, simplicity and ease of access. Speech Matrix® allows data collection in real-time in a patient/participant native environment.
For this study we implemented a dialog system for chronic pain patient’s assessment and monitoring. In US alone an estimate of over 10 million of individuals are living with chronic pain, and recently the Joint Commission on Accreditation of Healthcare Organizations called pain the fifth vital sign that providers should monitor in the care of patients[2], along with temperature, pulse, respiration, and blood pressure. Pain assessment is also an application for which well established standard questionnaires [2][3][4] are available, and the vocabulary for potential answers can be established from the medical literature. Figure 1 shows the dialog flow for Pain Monitoring Diary. The dialog flow is represented as a series of dialog units, where each unit comprises several caller-system exchanges designed to elicit one piece of information from the caller to fill a slot in the session report. Figure 2 shows a transcribed session and its corresponding report automatically generated by the system in the end of the session.
The characteristics and requirements of data capture task are different than those for other applications of spoken dialog technology. Successful dialog design needs to take the following specificities of this task into account: - The subjects participating in data collection are enrolled through a personal face-to face interview at which they receive relevant information about the trial and guidance on the process of data collection. In the same opportunity the patients can receive some training, explanation and possibly a demo on how to use the spoken dialog system. - Subjects call the system repeatedly according to the study protocol, and identify themselves in the beginning of the session. This provides an opportunity to use the knowledge accumulated across sessions for personalization. - The system should accommodate both novice callers (in the beginning of the trial) and experienced callers (those who completed several sessions); For the experienced caller, the system needs to provide short and effective call flow, without making the caller hear long and tedious prompts. For the novice caller, the system needs to provide enough information and help to guarantee question understanding and successful session completion. - Data validity, accuracy and integrity in this application are very important, since the penalty for an erroneously filed final session report can be very high. Since the automated speech recognition technology is not perfect, the design has to take into account the possibility of speech recognition errors and improve the overall accuracy using dialog actions such as re-prompts, confirmations, error handling, and, if necessary, recording and flagging the unrecognized utterances for later transcription. In the design of the dialog we addressed these task characteristics by providing an adaptive level of user support, and controlling the captured data accuracy.
High and controllable accuracy. We designed the system to take into account the limitations of current speech recognition technology and to be able to control the overall system accuracy. In general, the most important parameter that determines the accuracy of speech recognition is the size and the complexity of the grammar that is being used for recognition of the current utterance. The grammar in speech recognition describes (as text) the set of all the possible sentences that can be recognized by the system. For example, the simplest grammar that can be used in yes/no recognition will contain only two sentences: {“yes”, “no”}. During speech recognition, current user utterance is matched to every sentence in the grammar, and recognition result is given as the best matching sentence within the grammar, together with a score that measures the quality of the match. If the quality of the match is not good enough the recognizer will output a ‘rejection’. The size of the grammar used for recognition plays an important role in the expected accuracy of recognition results. On one hand, the smaller is the grammar, i.e., the smaller is the set of possible answers to distinguish between, the higher is the accuracy for the ‘in-grammar’ user utterances. In-grammar means that the user had spoken one sentence out of the set that is covered by the grammar, while out-of-grammar utterances are those that are not described by the grammar. For our previous example, if the user simply says “yes” or “no”, those are in-grammar utterances, while if she says “yes, that’s right” or “no-no”, those would be out-of-grammar ones. What happens when an out-of-grammar utterance is spoken by the user? The recognizer is still trying to match it to the set of sentences described by the grammar, and will output the best match, or rejection. Since the grammar does not contain the spoken sentence the best match of the recognizer in the out-of-grammar case is always erroneous, therefore, to improve the accuracy of the grammar we need to expand it to cover as much as possible all possible sentences that the user can utter. Another way to control the accuracy is to improve the rejection mechanism to guarantee that out-of-grammar utterances will be rejected. Based on these considerations we deployed the following methodology in the system design: Improved rejection mechanisms for yes/no and other grammars. Yes/no grammar plays an important role in this application, since it is used for confirmation of captured results. The yes/no was designed to have a very high accuracy by a) incorporating several different variations on how the users can say ‘yes’ and ‘no’, such as ‘ yes, that’s right’, ‘no, it’s wrong’, etc., and therefore reducing the out-of-grammar event frequency; and b) by improving the rejection criterion of the recognizer. The most commonly used rejection criterion is based of the recognition score. A better way to detect and reject out-of-grammar utterances is to create a garbage model in the grammar. The garbage model does not contain specific sentences, and is designed to match the out-of-grammar utterances. Ideally, we would like to have a low rejection rate and high accuracy rate, but there is a tradeoff between the recognition accuracy the rejection. The higher is the rejection rate the more accurate are the recognition results. The rejection rate in a grammar with a garbage model is controlled by two parameters: the relative weight of the garbage model and the score threshold below which the utterance is rejected. By tuning these parameters using some data recorded from real usage, we can control the overall accuracy of such grammar.
We designed the system to take into account the known limitations of automated speech recognition technology and to be able to ensure the overall high accuracy of data capture and session completion rate by: a) Improved rejection mechanisms for confirmation and other grammars. We incorporated a garbage model in the yes/no grammar used for confirmations in our application. The garbage model was designed to match out-of-vocabulary utterances [[5][6], specifically the corrections users are frequently providing instead of negative confirmation, e.g., System prompt: Was that your left shoulder? User: no, right shoulder We used rejection criterion based on combination of recognition score and garbage model scoring to control the overall accuracy of this grammar. b) Using confirmations as the way to control the larger grammar’s accuracy. The grammars that are substantially larger than yes/no are also those for which we can expect more ASR errors and out-of-vocabulary utterances. Those are grammars like the body-part grammar, or the symptoms grammar when, without substantial data collection, we cannot accurately predict all possible ways the users will answer the “where does it hurt?” question or the “what’s your most disturbing symptom?” question. For such grammars, we use the confirmation mechanism to control the overall accuracy of the data we capture. The result is considered captured only if the user answers “yes” to the confirmation question, reducing the error rate for the dialog units with larger grammars to the level of yes/no grammar. c) Using recording to capture the out-of-grammar answers and problematic user inputs. In some cases, e.g. when the user is trying to answer the “where does it hurt?” question with a word that is not covered by “body part” grammar, the confirmation mechanism does not help. For cases like this one, we offer the user to say a key-phrase like ‘none of those’ and then just record the user’s input: System prompt: “ Was that your left shoulder?” User: “ No” System prompt: “ Sorry about that. Let’s try it this way. Please choose carefully a body part from the following list that best describes the location of your pain, and just say it. If none of the locations match, please say ‘none of those’. Here is the list: abdomen <pause>, ankles …” User (barges in):” none of those” System prompt: “Ok. Let me just record your answer. Please describe the location of your pain in your own words.” User: <……> System prompt (after recording is finished): “ Thanks, I got that. Let’s move on.” The recorded utterance is captured and flagged as “transcription is needed” for later processing. The same mechanism of fall-back to recording instead of recognition is used after several repeated recognition failures.
The flexible level of user support that is intended to satisfy both the novice and the experienced users is achieved by deploying the following mechanisms. - Prompt Design. The system prompts are designed to provide an appropriate level of support to the user. For example, the initial prompt for the ‘Pain Location’ dialog unit is “ Where does it hurt? <pause>. For example, your head stomach or back? <pause>. Remember, if you don’t know how to answer this question, just say ‘I need help’ “. The pauses in this prompt are designed to encourage the experienced user to barge in with the answer (most experienced users barge in after the initial “where does it hurt” portion of the prompt), while providing more information (in this case, examples of possible answers) for the inexperienced user who hesitates to answer immediately. It is designed to remind the user to ask for help if it is still not clear what can be said as an answer. - Context sensitive help . It is unreasonable to expect system users to retain the information provided in the training material or at the system orientation session for the whole duration of the trial that can last for months. Therefore, for every question in Pain Monitoring Voice Diary, help information is provided on user’s request, describing and clarifying the current question, and in some cases enumerating the possible answers the caller can choose from, while in other cases giving more examples of possible answers. For example, if the caller asks for help after the “where does it hurt” question, the system will provide a very elaborate help prompt that lists different body parts that the user can say (pausing shortly after each one to encourage the user to barge-in if the user knows what to say). It also reminds the user that they can choose the “none of those” option: “ Okay. Here is the help information. At this point I need to find out the part of your body that hurts the most. Please choose carefully a body part from the following list that best describes the location of your pain, and just say it. If none of them matches, please say ‘none of those’. Here is the list: abdomen <pause>, ankles <pause>, back <pause>,...( list continues) …, toes <pause>. Which one is it?" The information provided during these explicit requests for help closely follows the information the user received during the enrollment process. - Detecting speech recognition failures . Even when the user has not asked for help explicitly, the dialog is designed to detect user’s repeated failures and provide more support. When the system experiences recognition problems such as rejection or silence, it will re-prompt the user again for the same question. The re-prompts are designed as an escalating list, providing increasingly more information and progressively constraining the user as more such errors are detected. For example, if the user’s utterance is rejected by the recognizer after the initial prompt: “ Where does it hurt? <pause> For example, your head, stomach or back? <pause>. Remember, if you don’t know how to answer this question, just say ‘I need help“. , the system will re-prompt for the same information with “ I didn’t get that. Please tell me the part of your body that hurts the most, Remember, you could always say ‘I need help’ ”, the second prompt skips the pauses and reminds the user to ask for help if needed, and also clarifies the question (“body part that hurts the most”). Another case where the system detects that something went wrong with speech recognition, is when the user says “no” to a confirmation question as in: System prompt: Was that your left shoulder? User: No. System prompt: Sorry about that. Let’s try it this way. Please choose carefully a body part from the following list that best describes the location of your pain, and just say it. If none of the matches, please say ‘none of those’. Here is the list: abdomen <pause>, … (list continues). Which one is it? Since the user disconfirmed the recognized body part, the system detects recognition problem and gives the user more information on how this question can be answered to minimize the out-of-grammar utterance rate. - Dialog Personalization. Data capture is a unique dialog application since not only the users call the system many times during the trial, but they also identify themselves in the beginning of each session. This provides a system with an opportunity to personalize both the content of the current session (what is the data to be collected) as well as the style (how to ask for these data) based on the results of the previous sessions. As shown in figure 1, in our system we took advantage of a larger inter-session context by designing two types data collection sessions: normal and follow up . The follow-up session type is deployed if the subject reported a high level of pain in the previous session. The follow-up session differs from the normal one not by the additional questions the patient is asked such as if and when the subject took the medication, etc, but also by the format of the questions. If in the previous session the subject reported pain in left shoulder, in the follow up session the question will be “is the pain still in your left shoulder?”. This format of “reminding” prompts was used for pain location and pain type dialog units, and it was designed to possibly shorten the dialogs and also provide the subject comfort and feeling of continuity in using the system
- Dialog Personalization. Data capture is a unique dialog application since not only the users call the system many times during the trial, but they also identify themselves in the beginning of each session. This provides a system with an opportunity to personalize both the content of the current session (what is the data to be collected) as well as the style (how to ask for these data) based on the results of the previous sessions. As shown in figure 1, in our system we took advantage of a larger inter-session context by designing two types data collection sessions: normal and follow up . The follow-up session type is deployed if the subject reported a high level of pain in the previous session. The follow-up session differs from the normal one not by the additional questions the patient is asked such as if and when the subject took the medication, etc, but also by the format of the questions. If in the previous session the subject reported pain in left shoulder, in the follow up session the question will be “is the pain still in your left shoulder?”. This format of “reminding” prompts was used for pain location and pain type dialog units, and it was designed to possibly shorten the dialogs and also provide the subject comfort and feeling of continuity in using the system
In the normal Session Subject indicated level of pain (7) which is above set threshold of 5. Based on protocol the follow up session is initiated in interval of 4 hours instead of 24.
Experimental evaluation of usability of the Pain Monitoring Voice Diary was performed with 24 volunteers, mostly students recruited on campus. The volunteers were asked to contribute ten sessions with the system over a period of 2 weeks; in practice the number of sessions per subject ranged from 1 to 12. There was no formal training session with the system provided, instead, once enrolled (through a website) the subjects received an email notification with their PIN and general information about the system. The subjects were asked to either relate to pain episodes in their past while answering the system’s questions, or use as a guidance one of 9 provided medical scenarios compiled by a pain specialist, ranging from migraines and back pain to post-surgery pain (knee injury), and cancer and chemotherapy-related afflictions. We collected the total of 118 dialog sessions: 113 sessions were completed, while in 5 the called hung up. 42 of the completed session were of the ’follow-up’ type. There were a total of 1766 dialog turns, where dialog turn corresponds to one system prompt and one user utterance. The data capture rate, measuring the percentage of slots filled automatically was 98%, while the other 2% were flagged for transcription. Data capture rate is not a direct measure of ASR accuracy since slots are not necessarily filled after first attempt. Among the utterances sent to transcription, where the user had opted for the ‘none of those’ option, 80% corresponded to the type of pain slot, and 20% to the symptoms slot, indicating that those are the grammars with the highest out-of-vocabulary rate.
Table 1 shows other metrics derived from dialogs[7]: average session duration; number of dialog units per session; average duration of a dialog unit; average number of caller utterances in dialog unit; average duration of one dialog turn; percentage of barged-in prompts and percentage of task-oriented prompts. The high standard deviations of session duration and dialog units per session are due to the extensive variability of dialog sessions. Not only the sessions differ by type (normal and follow up), but also there is branching within the same type application (e.g., some of the subjects report symptoms, while others don’t, some take medications, etc). In addition there is a great variability due to ASR errors and different possibilities inherent in the design of the call flow (e.g., caller initiated help requests, speech recognition error handling such as re-prompts, negative confirmations.) The high standard deviations in caller utterances per dialog unit and dialog unit duration are due to the fact that not all dialog units are created equal. For example, ‘Are you in pain’ dialog unit can fill a slot with a single ‘yes/no’ utterance, while ‘Pain Location’ unit requires at least 2 dialog utterances (body part and confirmation) in the case speech recognition does not fail, and more if it does. Percentage of task-oriented dialog turns (those are dialog turns that are NOT due to speech recognition errors or caller help requests) is a measure of dialog efficiency: if there were no errors and help requests at all, it would be 100%. The prompts in the dialog were designed to be barged-in by experienced callers. To quantify the use of barge-in we computed the percentage of barged-in prompts (66%). To quantify how far in the prompts the barge-in occurs we computed the average duration of dialog turn (7.19 sec), and compared it to the reference of average duration of dialog turn (10.63 sec) when barge-in was disabled.
To address those issues and to increase the efficiency of data collection, while decreasing time and costs, Spacegate designed an innovative and comprehensive clinical data collection system Speech Matrix® , utilizing natural language, voice, and automated speech recognition (ASR) technologies for over the phone, real-time data collection. Speech Matrix® allows direct interaction with back-office system databases using any phone as an “input” device. Speech Matrix® is an innovative clinical research voice data collection system based on natural speech and choice of expression. People are accustomed to using telephone devices and hence, data input through the telephone has the inherent advantage of familiarity, simplicity and ease of access. Speech Matrix® allows data collection in real-time in a patient/participant native environment. Speech Matrix® Voice Data Collection applications are receiving an enthusiastic reception from the clinical and behavioral research communities, including the National Institute of Health. Speech Matrix® brings a significant reduction in research and study costs, Speech Matrix® shortens the time for data collection, and Speech Matrix® provides new, sophisticated and extensive data collection methodology for clinical research.