E healthresearchpresentation

Speech Matrix® Systems

Automated Speech Recognition
for
Self-reports in Health Research

Esther Levin Alex Levin,
Spacegate, Inc Spacegate, Inc.
City College of New York alex@spacegate.com
esther@spacegate.com

1

Acknowledgement
Speech Matrix Research & Development is supported by NIH, NCI
SBIR Phase I program

Confidential – Spacegate, Inc. 2

Background
Health Data Collection
 Traditional methods
 Paper based diaries

 Observations

 Modern methods – Electronic Data Capture (EDC)
 PDA

 Web based

 IVR (touch–tone)

 Next Step - EDC
 ASR (automated speech recognition)


Electronic Data Capture –
Goals & Challenges
 Real data in real-time
 Data validation
 Regulatory compliance
 Increase clinical trial efficiency
 Reduce time-to-market
 Cost
 Patient/Subject burden

Automated Speech
Technology – Science & Art
 Voice to Data - voice data entry via automated
natural dialogue
 Automated Dialogue – Voice Forms
 Voice Interface Design
 Real-time Monitoring & Reporting
 Privacy & Security
 Regulatory Compliance


ASR System for Data Collection


Why Speech?
 Speech is a natural modality of interactions

 Phone is user friendly and ubiquitous and no special training for its
use is required.

 Dynamic Dialogue Flow
 personalization of both content and style based on the profile and
history.

 Real time feedback and monitoring
 real-time reports of captured data
 Automated Compliance monitoring

 Flexible and extensive scheduling
 Inbound/outbound call sessions
 the calls can be initiated by a system following a prescribed protocol

 Overall ASR based system offers an extensive and practical tool to
facilitate efficient and convenient real-time, two-way
communications.

Speech Matrix®-VDC™ Systems

 Extensive data collection system for health, clinical, life
science and behavioral research.
 Another branch of EDC
 Real-time data capture in a participant native
environment


Applications
 Pain Monitoring Diary (PMD™) prototype application - based on
questions in standard pain questioners (BPI)
 Drug use Diary (VDMD™) : Prototype application - based on
questions in standard Drug Use Diary (DR. Linda Sobell)

 Implements Dynamic Questionnaires
 Interactivity & Communications
 Reporting and Management


Dialog Design
 Task characteristics:
 Need to guarantee data validity, accuracy and integrity, taking into account
speech recognition errors
 improve the overall accuracy using dialog actions such as re-prompts,
confirmations, error handling, and, if necessary, recording and flagging the
unrecognized utterances for later transcription

 The system should accommodate both novice and experienced callers
 enough information and help to guarantee question understanding and
successful session completion for novice.
 short and effective call flow for experienced caller

 Subjects identify themselves in the beginning of the each session.
 Opportunity to use the knowledge accumulated across sessions for
personalization.

 Subjects may receive some training on the use of the spoken dialog system
during the enrollment session.

 Dialog Design Issues:
 controlling the captured data accuracy.
 adaptive level of user support


Controlling the Accuracy of Data Capture:
ASR 101
 Speech Recognition Grammar Design
 Example: yes/no grammar {yes, no}
 Caller utterance is matched compared to the
possibilities described by the grammar
 The output of ASR is the best matching ‘sentence’,
and a score
 If the score is too low => rejection

 Out-of-vocabulary utterances cannot be
recognized correctly
 Design tradeoff: minimize out-of-vocabulary
and minimize grammar size


Controlling the Accuracy of Data Capture.
 Improved rejection mechanisms to deal with out-of-vocabulary utterances .
 System prompt: Was that your left shoulder?
 User: no, left elbow
 System prompt: I didn’t get that. Was that your left shoulder? Please say ‘yes’ or ‘no’ .

 Reliable confirmation recognition.

 Using confirmations as the way to control the larger grammar’s accuracy.

 Using recording to capture the out-of-grammar answers and problematic
user inputs.
 System prompt: “Was that your left shoulder?”
 User: “No”
 System prompt: “Sorry about that. Let’s try it this way. Please choose carefully a body part from the
following list that best describes the location of your pain, and just say it. If none of the locations match,
please say ‘none of those’. Here is the list: abdomen <pause>, ankles …”
 User (barges in) ”none of those”
 System prompt: “Ok. Let me just record your answer. Please describe the location of your pain in your
own words.”
 User: <……>
 System prompt (after recording is finished): “Thanks, I got that. Let’s move on.”

 recorded utterance is captured and flagged as “transcription is needed” for later processing.

 same mechanism of fall-back to recording instead of recognition is used after several
repeated recognition failures.

Adaptive Level of User Support-I.

 Prompt Design.
 “Where does it hurt? <pause>. For example, your head
stomach or back? <pause>. Remember, if you don’t know how
to answer this question, just say ‘I need help’ “.

 Context sensitive help .
 help information describes and clarifies the current question,
 Provides examples of possible answers
 Example: help for the “where does it hurt” question: “Okay.
Here is the help information. At this point I need to find out the
part of your body that hurts the most. Please choose carefully a
body part from the following list that best describes the location
of your pain, and just say it. If none of them matches, please say
‘none of those’. Here is the list: abdomen <pause>, ankles
<pause>, back <pause>,...( list continues) …, toes <pause>.
Which one is it?"


Adaptive Level of User Support-II
 Detecting speech recognition failures .
 The re-prompts are designed as an escalating list, providing increasingly
more information and progressively constraining the caller.
 “Where does it hurt? <pause> For example, your head, stomach or
back? <pause>. Remember, if you don’t know how to answer this
question, just say ‘I need help“.,
 “I didn’t get that. Please tell me the part of your body that hurts the
most, Remember, you could always say ‘I need help’ ”,

 Detecting Misunderstandings .
 the user says “no” to a confirmation question as in:
 System prompt: Was that your left shoulder?
 User: No.
 System prompt: Sorry about that. Let’s try it this way. Please choose
carefully a body part from the following list that best describes the
location of your pain, and just say it. If none of them matches, please
say ‘none of those’. Here is the list: abdomen <pause>, … (list
continues). Which one is it?


Adaptive Level of
User Support-III
Dialog Personalization.
Taking advantage of a
larger inter-session
context by designing
two types data
collection sessions:
normal and follow up.
Both content and style
are different.
Reminding prompts:
“Is the pain still in your
left shoulder?”.


NORMAL SESSION
System : Your pin please.
Caller: **** Caller : it feels sharp
System: Ok, *****, lets begin. Are you in System : Was that ‘sharp’?
pain now? Caller: yeah
Caller: yes System: Is the pain constant or does it come
System: Where does it hurt?<pause> For and go?
example, your head, stom... Caller: comes and goes
Caller: my left shoulder System: Do you have any other symptoms?
System: Was that your left shoulder? Caller: yeah
Caller: yes System: Please tell me your most disturbing
System: Please rate your pain on a scale of symptom. For...
zero to ten, with zero being no pain and Caller: I have fever
ten being the worst possible pain System: I heard ‘fever’. Is that right?
Caller: it is seven Caller: yes
System: as that 7?<pause> Please say yes System: Would you like to leave a message
or no. for the staff? …
Caller: yeah Caller: no
System: How does the pain feel? System: Thanks for your cooperation. I've
Caller : I don't know got all the information that I need. I'll call
System : Okay, Here is the help information. you again in 4 hours. Goodbye.
At this point I need to find out the type of
your pain. Please choose carefully a
word from the following list that best
describes your pain and just say it. If
nothing matches, please say ‘none of
those’. Here is the list: aching, <pause>
burning, <pause> dull…


FOLLOW-UP SESSION
System : Your pin please. System: Is the pain still in your left shoulder?
Caller: **** Caller: yes
System: Hi, Esther, This is your follow-up call … System: Does the pain still feels sharp?
Are you still having pain? Caller: no
Caller: yeh
System: How does the pain feel?
System: Did you take your pain medication today?
Caller: yes Caller : It’s dull
System: What time did you take the medication? System : Was that ‘dull’’?
Caller : Two hours ago Caller: yeah
System: Was that around 9:45 am? System: Is the pain constant or does it come
Caller: yeh and go?
System: Did the medicine diminish your pain?... Caller: It’s constant
Please
System: Do you have any other symptoms?
Caller: yeh
System: Please rate your pain on a scale of zero Caller: no
to ten, with zero being no pain and ten being the System: Would you like to leave a message
worst possible pain. for the staff? …
Caller: four Caller: no
System: Was that 4? System: Thanks for your cooperation. I've
Caller: yeah got all the information that I need. I'll call
you again tomorrow. Goodbye.


Normal session report
Captured Value Confirmed (yes/no) Confidence Score
Pin **** no 66
Are you in pain? yes no 80
Pain Location left shoulder yes 86
Pain Intensity 7 yes 88
Pain Type sharp yes 88

Pain constant? pain comes and goes no 47
Symptoms fever yes 86
Message none no 78


Follow-up session report
Captured Value Confirmed Confidence Score
Pin ***** no 74
In pain? yes no 85
Medication taken? yes no 76
Medication time 9:45 am yes 69
Medication helped? yes no 75
Pain Rating 4 yes 87
Pain Location left shoulder yes 87
Pain Type dull yes 86
Pain constant? constant no 54
Symptoms none yes 82
Message none no 84


Usability Test
 24 subjects
 118 dialog sessions
 113 completed
 5 hang-ups
 42 follow-up
 1766 dialog turns
 98% automatic data capture – the rest
flagged for transcription

Results
Session duration (sec) 105.6(46.78)

Number of dialog units per session 7.85 (2.6)

Duration of dialog unit (sec) 13.46 (4.54)

Dialog turns per dialog unit 1.88 (0.46)

Percentage of task oriented turns 80% (16)

Percentage of barged-in prompts
66% (13)
Time duration of a dialog turn (sec) 7.19 (1.10)

Time duration of a dialog turn when barge-in was 10.63(1.5)
disabled


Summary
 ASR & Spoken Dialog Methodology for data
capture can provide:
 Additional real-time data collection tool
 Flexible protocol design
 Improves data validation and compliance
 Centralized collection and monitoring
 Telephone as ubiquitous device

 System design needs to take into account the
specificities of the task and the limitations of the
technology
 Flexible level of user support
 Controlled accuracy of the captured data


E healthresearchpresentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to E healthresearchpresentation

Similar to E healthresearchpresentation (20)

E healthresearchpresentation

Editor's Notes