4. What is mTurk?
4
• “Access to an on-demand, scalable workforce.”
• Requesters post Human Intelligence Tasks (HITs) and
people all around the world (known as “workers” or
“Turkers") complete these for some amount of
compensation.
9. Advantages
9
Scalability.
1 Diversity.
3 Turnaround.
4 Low cost.
5 Anonymity.
Tens of thousands of Turkers can
be accessed for a given study,
helping with statistical power and
our ability to screen for specific
qualifications.
2
17. “Full Stack Developer Full
Stack Developer Full Stack
Developer Full Stack Developer
Full Stack Developer Full Stack
Developer Full Stack Developer
Full Stack Developer”
mTurk worker
meeting 20 word requirement
18. “Friends? I work in IT. The
only friends I have are
here and we try to talk
about anything but it.”
mTurk worker
faking a response
20. “When surveys like this come up and no one seems
to qualify, I liked to go incognito and see what
demographic they want out of curiosity.
Before you downvote, I don't take the surveys once I
get it because I do research myself and I take data
purity seriously. Anyways, I couldn't find what
[requester] was after. Literally every demographic I
selected did not qualify.”
Reddit
28. 28
Googled Answers2
Zero Indication.
3 Fake Tooling.
4 ID & IP.
5 Review.
Provide workers with no indication
of the “right” answer.
1
Screening for your niche
29. 29
Googled Answers2
Zero Indication.
3 Fake Tooling.
4 ID & IP.
5 Review.
What you wrote:
1. Are you a software developer?
a. Yes
b. No
1
Screening for your niche
30. 30
Googled Answers2
Zero Indication.
3 Fake Tooling.
4 ID & IP.
5 Review.
What they see:
1. We are looking for software
developers. Do you have this job
title we are looking for? If you
say yes, we will give you money.
a. …sure, give me money
b. nah, I don’t want money
1
Screening for your niche
31.
32. 32
1
4
5
Zero Indication.
ID & IP.
Review.
Screening for your niche
3
2 Googled Answers.
Fake Tooling.
Test their domain knowledge with
short answer questions.
When doing this look out for
internet definition answers(some
tooling may be able to automate
this for you)
36. 36
1
2
3
5
4
Googled Answers.
Zero Indication.
Fake Tooling.
ID & IP.
Review.
Depending on the tooling you use,
you’re able to screen out duplicate
or incognito participants
automatically, based on their given
IP or mTurk ID.
Screening for your niche
37. 37
1
2
3
4
5
Googled Answers.
Zero Indication.
Fake Tooling.
ID & IP.
Review
Never depend solely on your
screening methods to weed
everyone out for you automatically.
Take the time to comb through
accepted responses to ensure you
didn’t get any unqualified
participants sneak through
Screening for your niche
38. Screening for your niche
38
• So what is it going to cost me to screen out this many
people if my niche is such a small representation of this
larger turker pool?
40. Qualification Surveys
40
• Consider separating your screener from your
substantive research activity.
• Example screener HIT:
“Take a ~2 minute qualification survey to become
qualified for higher paying hits.”
41. Qualification Surveys
41
• If you choose not to compensate those who do not
qualify, ensure you are upfront about compensation and
disqualify them early.
• Amazon’s policies do not prevent you from doing this,
but we want to ensure we are acting ethical, and
managing participants’ expectations
43. Panel Creation
43
• Whether or not you use separate qualification surveys,
you can tag turkers that take part in any of your
research activities.
44. Panel & Database Creation
44
• As you continue to scale and your panel creation
continues you will likely want to look to database
solutions to track these panels
45. Continuous panel curation
45
• Depending on your use cases, frequency of studies,
and niche, you may want to continuously add to your
panel.
• Last year, we continuously screened batches of
1,000-2,000 participants for our broader panel.
46. Scaling threats
46
• Panel retention can be an unknown.
• When publishing a task to panelists, we’ve had up to
50% of invitees complete the study.
• Directly messaging panelists (bonus worker) can
improve this.
47. Scaling threats
47
• Scale can amplify problems with workers.
• If 1% of workers report your HIT, that’s not a problem
with a sample of 100, but it becomes an issue in a
sample of 10,000.
• Be extremely transparent and over-communicate to
prevent issues.
50. Some tips for interactions
50
• Be ethical
• Err on the side of accepting submissions/paying workers
• Give bonuses for thoughtful responses (be upfront about the
possibility in your activity)
• Do not ask for PII, including contact info — if you need to contact a
worker, you can give them a “bonus” with a message, to which they
can respond
• Workers may report to Amazon, even if you leave an arguably PII
field as “optional”
• Protect data as if it’s identifiable
53. A/B Testing: The Solution
53
• At the time, we had already been curating a panel of
participants in the IT, Operations and Development
space.
• Within a couple of days of launching a test, we received
approximately 100 responses.
55. A/B Testing: The Results
55
• Although there was no clear cut winner when it came to
outright asking participants’ preference, this exercise
wasn’t a total loss.
• Many of those who preferred one of the color pallets
mentioned that the contrast of the other was too low,
resulting in poor legibility.
57. Accessibility(WIP): The Question
57
• After seeing the influence accessibility had on our color
palette debate, we started to wonder if we would be
able to test this type of thing more directly.
• We wanted to see what kind of vision impairments were
present within mTurk.
67. Final Thoughts
67
• mTurk is NOT the end all be all recruitment solution.
• When used properly mTurk is a more cleaned up quick
and dirty approach to get results fast.