accm-brfss-2022-presentation-draft.pptx

Developing a (partially) Automated CallCenterMonitoring/Management ToolUsing MachineLearning and
Practical Insights
Matt Jans, Zoe Padgett, James Dayton, Don Allen, Josh Duell, Shawn
Underwood, Mary Penn, Dave Roe, Lew Berman
BRFSS Annual Meeting (Virtual)
3/24/22

General interviewquality assurance(QA) challenges
• Time and effort to review
- Organizations generally QA/QC 5-10% of workload (i.e., completes, dials, hours)1
- Small %, but most aren’t problematic
- Difficult to balance breadth and depth
3
1 Jans et al., (2018). Roundtable: What Makes a Good Interviewer? Metrics and Methods for Ensuring Data Quality, Federal Computer-assisted Survey Information Collection (FedCASIC) Workshop

Motivating questionsfor an automatedcall center monitoring/managementsystem
• Can we automate portions of our QA procedures?
• How do we monitor a larger percentage of work than we do with typical QA without increasing QA costs?
• Will automating mundane aspects of QA work allow QA staff to spend more time identifying more complex
problems, and coaching?
4

5
• Comprehensive QA systems exist2
• Question text, audio recordings, and coding forms presented on a single screen
• Some automation3
• To our knowledge, no system has both elements
• Please tell us if you know of any!
Has anyone done this?
2 Thissen, M. R. (2014). Computer Audio-Recorded Interviewing as a Tool for Survey Research. Social Science Computer Review, 32(1), 90–104. https://doi.org/10.1177/0894439313500128
3 Timbrook, J., & Eck, A. (2019, February 26). Humans vs. Machines: Comparing Coding of Interviewer Question-Asking Behaviors Using Recurrent Neural Networks to Human Coders. 2019 Workshop:
Interviewers and Their Effects from a Total Survey Error Perspective, Lincoln, NE. (see full dissertation at https://www.proquest.com/openview/f3dd1734067619e90afe6d0d3c8a5c58/1?pq-
origsite=gscholar&cbl=44156)

6
• Using survey interview recordings
• Machine learning speech recognition and language understanding (i.e., natural language processing)
• Human QA staff can only review so many recordings or parts of recordings each month
• Automatically identifying clear, simple problems so QA staff can spend more time on those that only humans can find
• Triaging and filtering problems for QA confirmation to simplify and standardize the QA task
• Ideally, “touching” 100% of interviews or calls, even if only on basic errors
• Long-term goal
How we started to answer these questions

8
• Prior to 2021 – Testing and development phases
• Goals
• Establish how well speech recognition could identify…
• …interviewer and respondent turns and the words spoken
• ...misrecordings (i.e., interviewer doesn’t enter what respondent said)
• What we learned
• Works in situations with low audio fidelity (i.e., phone interviews, even mobile phones)
• Tweaking speech recognition and parsing algorithms takes time and skill
• Question and answer speech identified more reliably on simple questions
• Yes/No questions work best
• 2021 forward - Implementation
• Goal: Work ACCM into production QA review each month with Yes/No questions in BRFSS
• What we learned: Very low rates of misrecordings; More details in progress
ACCM development phases

9
Machine learning process flow

10
QA review process – all differences between ACCM and data record

11
What ACCM thinks
respondent said
What
interviewer
entered

12
Adjudication process
QA Form (reduced) goes here
(Check my notes on what
stays/goes)
Include notes
“Description” only used when there’s a
discrepancy (update text in box)

13
Adjudication process Options QA
staff can
choose

14
QA staff code
after review

Answering our own questions…

16
• Yes, but with mixed empirical results
• Only 5.3% of ACCM-screened recordings identified as interviewer misrecording
• After QA review, only 0.2% (overall) misrecording (3.7% “machine errors”)
• 70% of cases flagged by ACCM
• Need to assess whether interviewer misrecording is worth automating
• Core of the ACCM system still meets this goal
• Probably! No direct measure right now, but future directions are positive
How we continue to answer these questions

What’snext for ACCM
• Continue finding ways to review larger percentage of interviews
- And aspects of the interview most likely to reveal the largest problems
• Investigating introductions (first 30 seconds)
• Correlation between easy-to-measure metrics and hard-to-measure ones
- Correlate misrecordings and problems in first 30 seconds with other coded interviewer issues like verbatim reading and
neutral probing

ACCM Background
• Multiple small-scale pilots led by Lew Berman
• Major Finding
- High reliability between auto-coded responses and human-coded responses for simple questions with simple answer
categories (Berman, Boyle, Allen, Duell Jans, Iachan, McCoy, 2019)
• General self-rated health (S1Q1): 96% agreement (𝛋 = 0.94)
• Smoked 100 cigarettes (S9Q1): 74% agreement (𝛋 = 0.59)
• Seatbelt frequency (s13q1): 70% agreement (𝛋 = n/a)
• Employment (s8q15): 61% agreement (𝜅 = 0.50)
• Outdoor activity reduce/change (CT13_2): 20% agreement (𝜅 = n/a)
20

accm-brfss-2022-presentation-draft.pptx

Recommended

Recommended

More Related Content

Similar to accm-brfss-2022-presentation-draft.pptx

Similar to accm-brfss-2022-presentation-draft.pptx (20)

More from Lew Berman

More from Lew Berman (17)

Recently uploaded

Recently uploaded (20)

accm-brfss-2022-presentation-draft.pptx

Editor's Notes