OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
How Clara Labs works behind the scenes (UC Berkeley 2017)
1. Strategies for integrating people and
machine learning in online systems
Jason Laska Ph.D.
Machine Learning at Clara Labs
September 2017 | UC Berkeley RISE Seminar
presenting the work of
Michael Akilian, Briana Burgess, Joey Carmello, Matthew Ebeweber, David Gouldin, Evan Hadfield,
Amritha Jyanti, Olga Narvskaia, Maran Nelson, Jodi Nicolli, Emily Pitts, Gavin Schulz, Oliver Song
@claralabswww.claralabs.com @chappaquack
4. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
SF PA PASF SFrecurring location:
“Let’s meet in Palo Alto for
coffee next week.”
next week:
Lunch
Lunch
5. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
next week:
Apply constraints:
Lunch
Lunch
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
6. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
next week:
Lunch
Lunch
Apply constraints:
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
7. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
next week:
Lunch
Lunch
Apply constraints:
Coffee: 8am — Noon (preference)
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
8. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
next week:
Lunch
Lunch
Apply constraints:
Max Daily Meetings: 3
Coffee: 8am — Noon
(preference)
(preference)
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
9. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
next week:
Lunch
Lunch
?Apply constraints:
OOO: out of the office, what
does that mean in this context?
Apply NLP on calendar:
Max Daily Meetings: 3
Coffee: 8am — Noon
(preference)
(preference)
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
10. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
next week:
Lunch
Lunch
?
?Apply constraints:
OOO: out of the office, what
does that mean in this context?
Lunch: can we schedule over
this or is it important?
Apply NLP on calendar:
Max Daily Meetings: 3
Coffee: 8am — Noon
(preference)
(preference)
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
11. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
Max Daily Meetings: 3
next week:
OOO: out of the office, what
does that mean in this context?
Lunch: can we schedule over
this or is it important?
Apply NLP on calendar:
Lunch
Lunch
?
?Apply constraints:
Relax: can relax constraints if
there’s enough travel time?
?
Coffee: 8am — Noon
(preference)
(preference)
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
12. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
recurring location:
Max Daily Meetings: 3
next week:
OOO: out of the office, what
does that mean in this context?
Lunch: can we schedule over
this or is it important?
Apply NLP on calendar:
Lunch
Lunch
?
?Apply constraints:
Relax: can relax constraints if
there’s enough travel time?
?
Coffee: 8am — Noon
(preference)
(preference)
?
“Let’s meet in Palo Alto for
coffee next week.”
SF PA PASF SF
Location: Palo Alto
13. Example: Suggesting times
M Tu W Th F
8a
5p
12
busy
OOO
busy
busy
busy
busy
Lunch
busy
busy busy
busy
Lunch
Lunch
Lunch
?
? ?
graceful and intuitive
edge-case handling
customers really want
SF PA PASF SF
16. How Clara handles this example
preference constraints
participant availabilities/unavailabilities
any accessible party calendars
integrated with calendar
17. Breaking work into tasks
Predict
Compute
Annotate
Review
Override
Task Type
“Let’s meet in Palo Alto
for coffee next week.”
location: Palo Alto
channel: coffee
time-pref: next week
intent: schedule
18. Breaking work into tasks
Predict
Compute
Annotate
Review
Override
Task Type
simple
high precision rules
(before feedback)
after feedback
detector
only
single parameter example
feedback loop to
machine learning
19. Breaking work into tasks
Predict
Compute
Annotate
Review
Override
Task Type
location: Palo Alto
channel: coffee
time-pref: next week
intent: schedule
state: new + action: suggest times
20. Breaking work into tasks
Predict
Compute
Annotate
Review
Override
Task Type
feedback loop to
product and engineering
21. Breaking work into tasks
Predict
Compute
Annotate
Review
Override
Task Type
22. Sessions
a sequence of completed tasks produce Clara’s output
task n
task 1
task 2
annotation tasks
task n+1
review tasks override tasks
task m
task m + 1predict
predict
predict
compute
23. Automation is a spectrum
tightly integrate worker-operations,
machine learning, and UX-design
leverage task differences
avg throughput by hour
feb mar apr
~1.4xall sessions
sessions with worker
~1.8x
1x
tasks fully to partially automated
match task difficulty with processing
skill (person or machine)
cost and speed gains without full
automation
requires
24. Distributing a knowledge-workforce
constraints (requirements) challenges
bounded processing time
bounded processing cost
people are naturally slow
at data entry tasks
people hours are more
expensive than cpu cycles/hr
workforce size learning the platform
people are naturally noisy at
data entry
high accuracy bar
“staffing on a dime”workforce elasticity
25. Worker lifecycle
self-directed program
build commitment to platform through promotion
as increase credentials
qualification sandbox beginner expert
candidate worker
sourcing
exit platform
• english
comprehension
• other testing
• hours practice
• exceeds
threshold
worker
support
customer
support
exit platform if accuracy drops
below threshold for too long
production
27. Task assignment
hardeasiesttasks automatable
pre-filled prediction/automation
easy to annotate & understand
very manual
low impact (customer experience) high impact
high leverage new annotation
a lot of context
to review
annotation, review judgement-call
low leverage
new annotation
high confidence less confidence
“A tutorial on active learning ” S. Dasgupta and J. Langford 2009
31. 2nd escalate
easiest
1st escalate
Work recycling
recycle the first queue n times
beginners may escalate when unsure
(equivalent to “skip” in this case)
# of recycles = proxy to task difficulty
works well with incentive to avoid mistakes
32. 2nd escalate
easiest
1st escalate
other good strategies
Work recycling
“Double or Nothing: Multiplicative Incentive Mechanisms
for Crowdsourcing” by N. Shah, D. Zhou, 2014.
recycle the first queue n times
beginners may escalate when unsure
(equivalent to “skip” in this case)
# of recycles = proxy to task difficulty
works well with incentive to avoid mistakes
allow worker to “skip” tasks
reward based on known-examples:
no pay for skipping
high penalty for incorrect labels
high reward for correct labels
incentivizes skipping if worker confidence is low
33. workforce size
Distributing a knowledge-workforce
constraints (requirements) challenges
bounded processing time
bounded processing cost
workforce elasticity
people are naturally slow
at data entry tasks
“staffing on a dime”
people hours are more
expensive than cpu cycles/hr
learning the platform
people are naturally noisy at
data entry
high accuracy bar
34. Incentives
competing incentives drive both throughput and accuracy
worker throughput worker accuracy
per-task payment
time preference
(by ranked accuracy*)
*workers must also maintain a minimum acuracy to remain on the platform
35. via turtlebot in community Slack channel
Staffing: Supply and demand
(workers) (customer requests)
(all potential weekly supply)
workers find out about “pick up” work
requested hours
predicted demand/hour
worker reliability
worker throughput
worker accuracy
when supply >> demand:
when supply << demand:
hour
assignments
Jodi’s 3-pass assignment algorithm
dynamically bias tasks toward
more accurate workers
e.g., higher accuracy implies
higher probability of picking
up a task
try to match supply with predicted demand
workers unhappy, not enough work
customers unhappy, quality of service goes down
pay-rate a function of
demand
(think: surge pricing)
38. Measuring accuracy: Compare N workers
task 3 replicate
workers
aggregate
worker i task 3 accuracy
task 3 estimate
worker j task 3 accuracy
worker k task 3 accuracy
“Quality Management on Amazon Mechanical Turk” by P. Ipeirotis, F. Provost, and J. Wang, 2010.
Cohen/Fleiss kappa: := 1
1 pobserved agreement
1 pchance agreement
task 3
task 3
task 3
task 2
task 1
Inter-annotator agreement
Jointly estimate underlying value & annotator accuracy / bias (over many tasks)
“Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing”
by Y. Zhang, X. Chen, D. Zhou, and M. Jordan, 2014.
39. community Slack channel
we love the CRAs!!
Community is essential
community manager
runs worker support
builds platform documentation
community pulse surveys
promote the needs of the community
“Clara Remote Assistant” (CRA)
beta launches
CRAs love trying new tools before they roll out
we get amazing feedback
ask/answer questions, pictures of pets,
turtlebot, etc.
[realwaystoearnmoneyonline.com]
[workfromhomehappines.com]
41. “Parsing Time: Learning to interpret time expressions” G. Angeli, C. Manning, D. Jurafsky 2012Inspiration from
detect review annotate
tag resolve1 pm on Friday
rnn tagger DAG composer
recurrence intent
F, (8/11/2017,), (13:00,)
HOUR
MERIDIEM
F_INTERSECT
DAY_OF_WEEK
prediction
reference timetoken
annotation
Nice problems: Smart annotation
42. “I prefer the call to be in the morning and I
usually finish gym around 8:30 am, shower and
breakfast may take another 40 minutes. My
meetings usually start after 11 am, so any time
in between for the morning would be great.”
(9:10,11:00), available
(11:00, 23:59), unavailable
Nice problems: Smart annotation
43. Nice problems: Smart annotation
b) select from ranked lista) binary label
weight
weak scoring function
time candidate
X
i
wisi(c)
outstanding suggestions
business/lunch hours
how far in the future
travel distance
can cluster meeting
score =
annotations
is range
interval score
range or start-time
1-3pm, W
1-1:30pm, Th
9:30-10am, Th
9-11am, M
6:30-7pm, M
0.987
0.782
0.72
0.63
0.51
48. Additional References
“Calendar.help: Designing a workflow-based scheduling agent with humans in the loop”
J. Cranshaw, E. Elwany, T. Newman, R. Kocielnik, B. Yu, S. Soni, J. Teevan, A. Monrory-Hernández (Microsoft) 2017
“Building human-assisted AI applications”
A. Marcus (B12) 2016
@claralabswww.claralabs.com @chappaquack
“Real-world active learning: Applications and strategies for human-in-the-loop machine learning”
T. Cuzillo (CrowdFlower) 2015
“Language-independent discriminative parsing of temporal expressions”
G. Angeli, J. Uszkoreit 2013
“On scheduling events and tasks by an intelligent calendar assistant”
I. Refanidis and N. Yorke-Smith 2009
“A data quality metric: How to Estimate the Number of Undetected Errors in Data Sets”
Y. Chung, S. Krishnan, and T. Kraska 2017