Mechanical Turk Demystified: Best practices for sourcing and scaling quality participants for your niche

Mechanical Turk Demystified:
Best practices for sourcing and scaling quality
participants for your niche
Jake Volz, M.S.

Charles Mitchell, M.S.

What is mTurk?
4
• “Access to an on-demand, scalable workforce.”
• Requesters post Human Intelligence Tasks (HITs) and
people all around the world (known as “workers” or
“Turkers") complete these for some amount of
compensation.

Advantages
8
Scalability.
1
2
Diversity.
3 Turnaround.
4 Low cost.
5 Anonymity.
Although a slight majority of
workers are located in the US, the
platform is a great way to reach an
international audience.

Advantages
9
Scalability.
1 Diversity.
3 Turnaround.
4 Low cost.
5 Anonymity.
Tens of thousands of Turkers can
be accessed for a given study,
helping with statistical power and
our ability to screen for speciﬁc
qualiﬁcations.
2

Advantages
10
Scalability.
1 Diversity.
2
Turnaround.
4 Low cost.
5 Anonymity.
Easily one of the fastest ways to
obtain quick data. We leverage the
platform when we need speed.
3

Advantages
11
Scalability.
1 Diversity.
2
Turnaround.3
Low cost.
5 Anonymity.
Relatively low recruiting fees (20%
for N ≤ 10).

Average incentive is relatively low.*4
*Be ethical.

Advantages
12
Scalability.
1 Diversity.
2
Turnaround.3
Low cost.4
Anonymity.
Brand-blind studies are generally
pretty easy.

Participant data is mostly*
anonymous already.
5
*Protect the data as there is a potential link to PII.

14
Quality.2
Money.
3 Cheaters.
4 Forums.
5 UI & Support.
Money is the main incentive for
tasks.

Many workers are here for
supplemental income.
1
Challenges

“Hello Dear Sir or Miss could
you pay me please because
i need the money to buy
something thank you.”
mTurk worker
awaiting incentive

16
1
3
4
5
2
Challenges
Quality.
Money.
Cheaters.
Forums.
UI & Support.
Unthoughtful, brief, and dishonest
responses are constantly a problem.

“Full Stack Developer Full
Stack Developer Full Stack
Developer Full Stack Developer
Full Stack Developer Full Stack
Developer Full Stack Developer
Full Stack Developer”
mTurk worker
meeting 20 word requirement

“Friends? I work in IT. The
only friends I have are
here and we try to talk
about anything but it.”
mTurk worker
faking a response

19
1
2
4
5
3
Challenges
Quality.
Money.
Cheaters.
Forums.
UI & Support.
People try to game the system
whenever there is a loophole.

“When surveys like this come up and no one seems
to qualify, I liked to go incognito and see what
demographic they want out of curiosity.
Before you downvote, I don't take the surveys once I
get it because I do research myself and I take data
purity seriously. Anyways, I couldn't find what
[requester] was after. Literally every demographic I
selected did not qualify.”
Reddit

21
1
2
3
5
4
Challenges
Quality.
Money.
Cheaters.
Forums.
UI & Support.

22
1
2
3
4
5
Challenges
Quality.
Money.
Cheaters.
Forums.
UI & Support.

Researching with mTurk
24
Mechanical Turk is NOT a fully
ﬂeshed out User Research platform,
but it can be linked to one.

28
Googled Answers2
Zero Indication.
3 Fake Tooling.
4 ID & IP.
5 Review.
Provide workers with no indication
of the “right” answer.
1
Screening for your niche

29
Googled Answers2
Zero Indication.
3 Fake Tooling.
4 ID & IP.
5 Review.
What you wrote:  
1. Are you a software developer?

a. Yes

b. No
1

30
Googled Answers2
Zero Indication.
3 Fake Tooling.
4 ID & IP.
5 Review.
What they see:

1. We are looking for software
developers. Do you have this job
title we are looking for? If you
say yes, we will give you money.

a. …sure, give me money

b. nah, I don’t want money
1

32
1
4
5
Zero Indication.
ID & IP.
Review.
3
2 Googled Answers.
Fake Tooling.
Test their domain knowledge with
short answer questions.

When doing this look out for
internet deﬁnition answers(some
tooling may be able to automate
this for you)

34
1
2
4
5
3
Googled Answers.
Zero Indication.
Fake Tooling.
ID & IP.
Review.
Use Red Herring answers in
multiple choice questions.

36
1
2
3
5
4
Googled Answers.
Zero Indication.
Fake Tooling.
ID & IP.
Review.
Depending on the tooling you use,
you’re able to screen out duplicate
or incognito participants
automatically, based on their given
IP or mTurk ID.

37
1
2
3
4
5
Googled Answers.
Zero Indication.
Fake Tooling.
ID & IP.
Review
Never depend solely on your
screening methods to weed
everyone out for you automatically.

Take the time to comb through
accepted responses to ensure you
didn’t get any unqualiﬁed
participants sneak through

38
• So what is it going to cost me to screen out this many
people if my niche is such a small representation of this
larger turker pool?

Qualification Surveys
40
• Consider separating your screener from your
substantive research activity.
• Example screener HIT:
“Take a ~2 minute qualification survey to become
qualified for higher paying hits.”

Qualiﬁcation Surveys
41
• If you choose not to compensate those who do not
qualify, ensure you are upfront about compensation and
disqualify them early.
• Amazon’s policies do not prevent you from doing this,
but we want to ensure we are acting ethical, and
managing participants’ expectations

Panel Creation
43
• Whether or not you use separate qualiﬁcation surveys,
you can tag turkers that take part in any of your
research activities.

Panel & Database Creation
44
• As you continue to scale and your panel creation
continues you will likely want to look to database
solutions to track these panels

Continuous panel curation
45
• Depending on your use cases, frequency of studies,
and niche, you may want to continuously add to your
panel.
• Last year, we continuously screened batches of
1,000-2,000 participants for our broader panel.

Scaling threats
46
• Panel retention can be an unknown.
• When publishing a task to panelists, we’ve had up to
50% of invitees complete the study.
• Directly messaging panelists (bonus worker) can
improve this.

Scaling threats
47
• Scale can amplify problems with workers.
• If 1% of workers report your HIT, that’s not a problem
with a sample of 100, but it becomes an issue in a
sample of 10,000.
• Be extremely transparent and over-communicate to
prevent issues.

Some tips for interactions
50
• Be ethical
• Err on the side of accepting submissions/paying workers
• Give bonuses for thoughtful responses (be upfront about the
possibility in your activity)
• Do not ask for PII, including contact info — if you need to contact a
worker, you can give them a “bonus” with a message, to which they
can respond
• Workers may report to Amazon, even if you leave an arguably PII
ﬁeld as “optional”
• Protect data as if it’s identiﬁable

A/B Testing: The Solution
53
• At the time, we had already been curating a panel of
participants in the IT, Operations and Development
space.
• Within a couple of days of launching a test, we received
approximately 100 responses.

A/B Testing: The Results
54
A dead tie.

A/B Testing: The Results
55
• Although there was no clear cut winner when it came to
outright asking participants’ preference, this exercise
wasn’t a total loss.
• Many of those who preferred one of the color pallets
mentioned that the contrast of the other was too low,
resulting in poor legibility.

Case Study:
Accessibility(WIP)

Accessibility(WIP): The Question
57
• After seeing the inﬂuence accessibility had on our color
palette debate, we started to wonder if we would be
able to test this type of thing more directly.
• We wanted to see what kind of vision impairments were
present within mTurk.

Accessibility(WIP): The Test
58
• We launched a brief qualiﬁcation survey for $.10
• Within a few days we received 1,000 responses.

Accessibility(WIP): The Responses
59

Other Use Cases
61
• Card Sorting
• Unmoderated Usability Testing
• Moderated Usability Testing
• Diary Studies
• and more…

Final Thoughts
67
• mTurk is NOT the end all be all recruitment solution.
• When used properly mTurk is a more cleaned up quick
and dirty approach to get results fast.

Final Thoughts
68
• If we could ﬁnd our niche, you can ﬁnd yours

Thank you
Q&A
Jake Volz, jvolz@us.ibm.com

Charles Mitchell, mitchelc@us.ibm.com

Mechanical Turk Demystified: Best practices for sourcing and scaling quality participants for your niche

Recommended

Recommended

More Related Content

Similar to Mechanical Turk Demystified: Best practices for sourcing and scaling quality participants for your niche

Similar to Mechanical Turk Demystified: Best practices for sourcing and scaling quality participants for your niche (20)

More from UXPA International

More from UXPA International (20)

Recently uploaded

Recently uploaded (20)

Mechanical Turk Demystified: Best practices for sourcing and scaling quality participants for your niche