Users Really Do Answer Telephone Scams
Authors:
Huahong Tu, University of Maryland; Adam Doupé, Arizona State University; Ziming Zhao, Rochester Institute of Technology; Gail-Joon Ahn, Arizona State University and Samsung Research
Distinguished Paper Award Winner
Abstract:
As telephone scams become increasingly prevalent, it is crucial to understand what causes recipients to fall victim to these scams. Armed with this knowledge, effective countermeasures can be developed to challenge the key foundations of successful telephone phishing attacks.
In this paper, we present the methodology, design, execution, results, and evaluation of an ethical telephone phishing scam. The study performed 10 telephone phishing experiments on 3,000 university participants without prior awareness over the course of a workweek. Overall, we were able to identify at least one key factor---spoofed Caller ID---that had a significant effect in tricking the victims into revealing their Social Security number.
Full paper: https://www.usenix.org/conference/usenixsecurity19/presentation/tu
Users really do answer telephone scams USENIX Security 2019 Presentation
1. Users Really Do Answer
Telephone Scams
Huahong Tu (Raymond), UMD
Adam Doupé, ASU
Ziming Zhao, RIT
Gail-Joon Ahn, ASU & Samsung
Distinguished Paper Award Aug 15, 2019#usesec19
4. Collect and listen to scam samples
• Collected over 150 telephone scam samples from the IRS,
YouTube, Sound Cloud, News sites, etc.
• Listened to each them identify different attributes.
5. What are the telephone scam attributes we’ve identified?
• Area Code: e.g. Washington (202), Local (480), Toll Free (800)
• Caller Name: a known name displayed with the caller ID
• Voice Production: e.g. human or synthesized voice
• Gender: e.g. male or female voice
• Accent: e.g. American or Indian accent
Entity: who to impersonate, e.g. IRS or the university’s HR dept
Scenario: provide motivation to divulge SSN, e.g. tax or payroll
6. • Design a minimum set of experiments
that allow comparison of different
properties of an attribute with a set
of standard background conditions.
How did we design our experiments?
7. List of all our experiments and their attribute properties
Caller ID Area Code Location Caller Name Voice Production Gender Accent Entity Scenario
E1 202-869-XXX5 Washington, DC N/A Synthesizer Male American IRS Tax Lawsuit
E2 800-614-XXX9 Toll-free N/A Synthesizer Male American IRS Tax Lawsuit
E3 480-939-XXX6 University Location N/A Synthesizer Male American IRS Tax Lawsuit
E4 202-869-XXX0 Washington, DC N/A Synthesizer Female American IRS Tax Lawsuit
E5 202-869-XXX2 Washington, DC N/A Synthesizer Male American IRS Unclaimed Tax Return
E6 202-849-XXX7 Washington, DC N/A Human Male American IRS Tax Lawsuit
E7 202-869-XXX4 Washington, DC N/A Human Male Indian IRS Tax Lawsuit
E8 480-462-XXX3 University Location N/A Synthesizer Male American ASU Payroll Withheld
E9 480-462-XXX5 University Location W-2 Administration Synthesizer Male American ASU Payroll Withheld
E10 480-462-XXX7 University Location N/A Synthesizer Male American ASU Bonus Issued
8. How we gathered our phone number recipients?
• Downloaded our university’s public phone directory
associated with our staffs and faculties.
• Removed telephone numbers of people already aware of
the study.
• Randomly selected 3,000 telephone numbers and
assigned 300 to each experiment.
9. Steps we took to mitigate the risks to our recipients
• Worked with IRB on our experimental process.
• In all experiments, no SSN was actually collected.
• Upon entering any SSN digit, the user was immediately informed
that the call was just an experiment, and no SSN was actually
collected, IRB contact was given at the end.
• Each recipient only received one phone call.
• Prior to dissemination, we communicated and coordinated with
the HR dept and tech support office.
10. Dissemination
• Set up our experiments using an online robocalling
platform.
• 10 experiments can run simultaneously.
• Limited all experiments to a single work week, duringthe
work hours of 10am – 5pm.
• Outbound and return calls were directed to start of each
experiment’s standard procedure.
19. Day 1 Day 2 Day 3 Day 4 Day 5
• 2 hours and 45 minutes since launch:
• The school of journalism and mass communication
identified our scam calls…
• They did not consult with the IT department and sent out
mass emails in their dept to warn about the scam calls.
20. Day 1 Day 2 Day 3 Day 4 Day 5
• 4 hours and 22 minutes since launch:
• The university’s telephone service office started blocking
our phone calls…
• Our calls were triggering IT system alerts as they were
exhausting the university’s telephone trunk routes.
• So we had to reduce the rate of outgoing calls.
21. Day 1 Day 2 Day 3 Day 4 Day 5
• Day 2 since launch:
• The IRB received many complaints…
• So they asked us to pause our experiments so that they
could review the study was proceeding as described.
• 12 hours later, after review, they found everything was in
order, and suggested we proceed.
24. Finding an Analysis Metric
• Entered SSN: # of users entered a digit when asked for
last 4 SSN digits
Issue: Too lax as a measure since users could have
enter fake SSNs
Convinced: # of users enter 1 indicating that they were
convinced by the scam
Issue: Too sparse as users rarely indicated that they
were convinced by the scam
25. Our Chosen Metric
• Possibly Tricked: # of users Entered SSN - Unconvinced
–A more reasonable estimate of the actual number of
recipients that fell for the scam that is not too lax and
not too sparse.
27. Results of Possibly Tricked
10.33%
7.00%
6.00%
4.00%
3.33%
2.00% 2.00%
1.33%
0.67% 0.33%
E9 E8 E10 E2 E4 E3 E6 E7 E10 E5
Your payroll is withheld by the University,
Caller ID shows W-2 Administration
28. Results of Possibly Tricked
10.33%
7.00%
6.00%
4.00%
3.33%
2.00% 2.00%
1.33%
0.67% 0.33%
E9 E8 E10 E2 E4 E3 E6 E7 E10 E5
You have an Unclaimed
Tax Return from the IRS
29. Linear regression coefficients of all attribute properties
Local
TollFree
Washington,DC
Unknown
Known
Synthetic
Human
Male
Female
American
Indian
IRS
ASU
TaxLawsuit
UnclaimedTaxReturn
PayrollWithheld
BonusIssued
Area Code Caller
Name
Voice
Production
Gender Accent Entity Scenario
30. Statistical significance & effect size of comparable
attribute properties
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Entity Scenario
(IRS vs. HR)
Area Code
(202 vs. 800)
Voice Gender
(Male vs.
Female)
Voice
Production
(Synthetic vs.
Human)
Motivation
(Reward vs.
Fear)
Caller Name
(Unknown vs.
Known)
Voice Accent
(Indian vs.
American)
Conclusive Somewhat Not Conclusive
Adjusted p-Value Effect Size
32. Reasons Unconvinced
1
2
2
2
2
3
3
16
Synthetic voice
Did not sound legit / convincing
Indian accent
Asked to enter SSN
Did not asked for full SSN
Not from ASU caller ID number
Already aware of scams like this
The IRS / ASU won't make calls like this
33. Spearphishing is effective
• Telephone scammers may spoof a
known caller ID name and voice a
plausible scenario to make the scam
exceptionally convincing.
34. Ways to protect the users
• Make the users be aware of telephone scams.
• E.g. The HR won’t make calls like this
• Adopt caller ID authentication technology.
• Provide safeguards against caller ID spoofing
• Fight malicious calls with a caller ID reputation system
• More research into the understanding of scammers.
35. Thank you for your attention!
Post your questions to @h2raymond
Editor's Notes
Hi my name is Raymond tu from the university of Maryland, today, it is my pleasure to talk about our research on telephone scams.
I’m very grateful for receiving the distinguished paper award.
Let me ask you all a question: how many received a robocall today?
How many of you have never received a robocall.
Robocall is such a problem, john oliver did an entire episode on it
According to the fcc, nearly half of the call you receive will be spam, scam or robocalls.
We understand this is a serious issue, to do something about, inspired by a fellow research paper by matt tisher 3 years ago at Oakland, which I was also there with a paper.
In this research, we have decided to ask the question,
What causes the users to answer and fall victim to telephone scam?
So, How did we conduct our research?
Remove the animations
How to Conduct a Study like this?
We scammed people, and here’s how we did it
First, we collected 100+ scam samples And identified attributes
Rather than simply replicating the scams, we broke the scams into visual and voice attributes
And our experiments were designed to test these attributes and see what made the most impact to the attack success
So after we designed the experiments, we Disseminate phone calls, Collect and tabulate results, Select analysis criteria and present analysis results, and Provide evaluations and recommendations
First, to understand what contributes to telephone scams, we collected over 150 telephone scam samples from various public sources and listened to all of them with the goal to identify different attributes.
After listening to the scam samples, what are the scam attributes we identified?
Here’s the list of attributes that we identified:
Area Code: e.g. 202, 480, 800
Caller Name: name associated with the caller ID
Voice Production: e.g. human or synthesized
Gender: e.g. male or female
Accent: e.g. American or Indian
Entity: who to impersonate, e.g. IRS or HR
Scenario: motivation to divulge SSN, e.g. tax or payroll issue scenario
With these attributes in mind, how did you did design our experiments
This is our design principle:
Design a minimum set of experiments that allow comparison of different properties of an attribute with a set of standard background conditions.
And so here is the list of all our experiments and their attribute properties.
There are 10 experiments in total, every one of them is design to test specific type of attribute property.
With 10 experiment we have, the next step is to gather the phone recipients for our experiments.
To do that, we emulated what a real world spammer would do, that is to download or crawl our university’s public telephone directory associated with our staffs and faculties.
After gathering those phone numbers, we removed the numbers of those people that were already aware of our study, such as people we worked with from the IT and IRB department.
After that, we randomly selected 3000 phone numbers and assigned 300 to each experiment.
Also to migrate the risks to our recipients, we worked the IRB to design our experimental process.
For instance, in all experiments, we made sure that no ssn was actually collected.
And, upon entering any ssn digit, the user was immediately informed that was actually just an experiment, no ssn was collected and IRB contact information was also given.
Also, each recipient would also receive one phone call from us.
Finally, prior to dissemination, we coordinated with the HR dept and tech support office to ensure proper response to our calls.
With that out of the way, to ensure the entire procedure was completely standardized and automated, we set up our experiments using an online robocalling platform.
In the online platform, we set up our account with 10 different campaigns, so that the 10 experiments can run simultaneously.
We also limited the experiments to a single work week, during the work hours of 10am – 5pm.
Finally, in each experiment, the outbound and return calls were directed to the start of each experiment’s standard procedure.
So here’s what the standard procedure looks like:
First, we start with ringing the recipient’s work phone and displaying the visual attributes of the experiment.
Here for example, we show a 480 area code that is local to our university’s location.
If the recipient picked up the phone, we played what we called a “scenario announcement message” with the voice attributes of the experiment. Here’s an example.
At the end, we asked the recipient to continue by pressing 1, if they did so,
We play a follow up message that requests the recipient to enter the last 4 digits of their social security number. Here’s what it sound like.
During this step, if the user press any digit on the phone, it is immediately directed to the next step
At this point, the user hears a debriefing announcement, and asked them to participate in our survey. Here’s what it sounds like, it’s very long so I will just play a part of it.
At the end of this message, we asked user to participate in our survey by pressing 1, if they pressed 1 to continue,
Here we followed up with some survey questions, we asked questions like “were you convinced by the scam?”, “what were the reasons you were convinced or unconvinced by the scam?” and then we record their voice responses.
Here’s what the first question sounds like.
After this step, they will hear an IRB statement and a contact info for any questions or concerns.
That’s standard procedure of each call, so we were ready to actually start sending out our calls,
Here’s the call logs of all recipents by that pressed 1 to continue during the experiments.
As you can see, during the dissemination, several incidents happened that come up unexpected.
At 2 hours and 45 minutes since launch,
The School of journalism and mass communication identified our scam calls.
They did not consult with the IT department and sent out mass emails in their dept to warn about the scam calls.
At 4 hours and 22 minutes since launch:
The university’s telephone service office started blocking our phone calls…
Our calls were triggering IT system alerts as they were exhausting the university’s telephone trunk routes.
So we had to reduce the rate of outgoing calls.
At Day 2 since launch:
The IRB received many complaints…
So they asked us to pause our experiments so that they could review the study was proceeding as described.
12 hours later, after review, they found everything was in order, and suggested we proceed.
After completing the experiments, these are the results we collected:
On the data we collected, there are 6 different actions that we measured
Continued is the number of people continued after listening to the announcement message
Entered is the number of people enter a digit of their social security number
Convinced is the number of people explicitly stated that they were convinced by the scam during the survey question
Recording is the number of people stated they the reason why they were convinced
Unconvinced is the number of people explicitly stated that they were unconvinced by the scam during the survey question
Recording is the number of people stated they the reason why they were unconvinced
With our data, to perform an analysis, we needed to find an Analysis Metric.
This was a challenging task because if we used Entered SSN as our metric, it could be too lax as a measure since users could have enter fake SSNs
If we used Convinced as our metric, the data could be too sparse as users rarely indicated that they were convinced by the scam
So in the end, we settle on Possibly Tricked as our metrics, which is derived from subtracting the # of Unconvinced from Entered SSN
It provided a more reasonable estimate of the actual number of recipients that fell for the scam
Here’s the result of Possibly Tricked across different experiments.
For the most successful experiment, we had 10.3% receipeits possibly tricked.
For least successful experiment, only 0.3% were possibly tricked.
To better understand the attributes Further Analysis on Possibly Tricked,
Comparing linear regression coefficients of all attribute properties.
Comparing statistical significance & effect size of comparable attribute properties.
In this chart, all attribute properties were overfitted on our possibly tricked data to get the linear regression coefficient
The other analysis was Statistical significance & effect size, for comparable experiemtns for the comparable attribute properties, and found the changing he entity scenario had the most conclusion results. This is calculated based on the adj p-value which was stepped down using the holm-berferoni method, and the effect size which was based on cohens ‘d
Finally, we also analyzed the recording of survey participants that stated that they were convinced by the scam, and here are the results.
Here are the results for the Reasons Unconvinced, as you can see, main reaseaon they were unconvicend was because they suspected that the irs or asu won’t make calls like this.
What can we learn from our study?
Spearphishing is effective
Telephone scammers may spoof a known caller ID name and voice a plausible scenario to make the scam exceptionally convincing.
In defense, we recommend the following ways to Ways to protect the users
Make the users be aware of telephone scams.
Teaching users that The HR won’t make calls like this
Adopt caller ID authentication technology.
More research into the understanding of scammers.