Community Profiling for Crowdsourcing Queries

COMMUNITY PROFILING FOR
CROWDSOURCING QUERIES
Khalid Belhajjame 1
Marco Brambilla 2
Daniela Grigori 1
Andrea Mauri 2
1 PSL, Paris-Dauphine University, LAMSADE, France
2 Politecnico di Milano, Italy

Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 2
Traditional vs Community Crowdsourcing
• General structure:
• the requestor poses some questions
• a wide set of responders are in charge of providing answers
(typically unknown to the requestor)
• the system organizes a response collection campaign
• Traditional Crowdsourcing
• Cost – Quality Tradeoff
• Complex results aggregation
• Community Crowdsourcing
• Matching the task to the “correct” group of workers

Community
Set of people that share
• Interests
• Feature
..or belong to
• common entity
• social network

Leveraging communities
• Why?
• Experts
• More engaged
• How?
• Determine the communities of performers
• Target the correct community
• Monitor them taking into the account the behavior of their members

The approach
• Models
• Query Model
• Community Model
• Matching strategies
• Keyword based
• Semantic based

Query Model
• Textual description of the task
• Examples of query and responses
• Knowledge needed
• Prior knowledge (knowledge base) that can be used for
partially answering or for identifying potential answers.
• Type of the task
• Unary: tag, classify, like, …
• N-ary: match, cluster, …
• Objects
• Kind, description, text, metadata …
• Temporal

Community Model
• Textual description of the community
• name, web page, …
• Type of the community
• Explicit: Statically existing and consolidated
• Implicit: Dynamically built based on the need
• Definition
• Intentional: defined by a property
• Extensional: list of members
• Both
• Grouping factor
• Friendship, interest, location, expertise, affiliation

Community Model
• Content
• Produced by the people of the community
• Members’ profiles
• Explicit
• Implicit
• Communication channel
• Email, facebook, linkedin, twitter, blogs or web sites (reviews,
expert sites), AMT

Relations between Communities
• Subsumption
• A given community contains another community
• i.e. Sport fans contains Soccer fans
• Similarity
• Two communities refer to similar expertise or topic
• i.e. Experts in classical music and experts on opera

Matching
• Keyword Based
• Communities and query treated as bag of words
• Requires indexing
• Semantic Based
• Communities and query are mapped to concepts
• Requires semantic annotation

Community Control
Community control consists in the adaptation of the
crowdsourcing campaign according to the behavior of the
community
• Task / Object allocation (granularity)
• Static / Dynamic
SOCM’14, Monday, April 7

Monday 15th September 2014 Community Profiling for Crowdsourcing Queries
CrowdSearcher
A prototype that allows
the definition,
execution and control
of a crowdsourcing
campaign
12
http://crowdsearcher.search-computing.org/

Example (dynamic control)
CompExecs
TaskID
Materials
NonRelevant
PeoplePlace
Image Task
μTObjExecution
Performer
Status
StartTs
EndTs
μTaskID
Object
Control
CompObjs
Task
Control
Name
Performer
Control
PerformerID
Score
ProfPhoto
TaskID
ObjectID
ImgUrl
Correct
Category
ObjectID
Answer
Eval
ProfessorID
ObjectID
TaskID
PerformerID
PerformerID
Community CommunityID
Name
Community
Control
CommunityID
Score
CommunityID
Enabled
Status
Status
LastExec
Tests
Execs
13
e: AFTER UPDATE FOR μTObjExecution
c: CommunityControl[CommunityID== NEW.CommunityID].score<=0.5
CommunityControl[CommunityID== NEW.CommunityID].eval=10
a: SET CommunityControl[CommunityID == DB-Group].Enabled = true
?

Experiment
• 16 professors within two
research groups in our
department (DB and AI
groups)
• The top 50 images
returned by the Google
Image API for each query
• Each experts have to
evaluate 5 images at time
• Results are accepted
when enough agreement
on the class of the image
is reached
• Evaluated objects are
removed from new
executions.

Communities
The communities:
• the research group of the professor,
• the research area containing the group (e.g. Computer
Science)
• and the whole department (which accounts for more than 600
people in different areas)
Invitations are sent:
• inside-out: we started with invitations to experts, e.g. people
the same groups as the professor (DB and AI), and then
expanded invitations to Computer Science, then to the whole
Department, and finally to open social networks (Alumni and
PhDs communities on Facebook and Linkedin);
• outside-in: we proceeded in the opposite way, starting with the
Department members, then restricting to Computer Scientists,
and finally to the group's members.

Number of performers per community
70"
60"
50"
40"
30"
20"
10"
Community-based Crowdsourcing
16
0"
9 / “a lot”
7/18/2013" 7/19/2013" 7/20/2013" 7/21/2013" 7/22/2013" 7/23/2013" 7/24/2013" 7/25/2013" 7/26/2013" 7/27/2013" 7/28/2013"
#"Performers"
Time"
research"group"
research"area"
department"
social"network"
total"
46%
24%
16%

Precision of performers per community
1"
0.9"
0.8"
0.7"
0.6"
0.5"
0.4"
0.3"
0.2"
0.1"
17
0"
0" 500" 1000" 1500" 2000" 2500" 3000"
Precision)
#Evalua0ons)
research"group"
research"area"
department"
social"network"
total"

Precision of the evaluated objects
• Precision decreases with less expert communities
• Inside-out strategy (from expert to generic users) outperforms
Outside-in strategy (from generic to expert users)
1$
0.95$
0.9$
0.85$
0.8$
0.75$
0.7$
0.65$
18
0.6$
0$ 100$ 200$ 300$ 400$ 500$ 600$ 700$ 800$
Precision)
#Closed)Objects)
precision$(main$experiment)$
precision$(reverse$invita<ons)$

General observations
A given community of workers can be broken down into
(possibly overlapping) sub-communities with different
expertise
Experts from community feel more engaged with the task
• They are more demanding with respect to the quality of
the application UI and the evaluated objects
• Provide feedbacks on the application, question and the
objects evaluated
• “How is it possible that this image is related to me?!”

Conclusions
• Communities can be effectively used for tasks that require
domain expertise
• How to deal with tasks requiring multiple expertise
• How to build a knowledge base that allows profiling of
both communities and queries in a optimal way
• How to cope with the dynamics over time of
• Communities and task (changing needs)
• Communities and worker expertise

Thanks for your attention
Any Question?
Contacts
Khalid Belhajjame Khalid.Belhajjame@dauphine.fr
Marco Brambilla Marco.Brambilla@polimi.it
Daniela Grigori Daniela.Grigori@dauphine.fr
Andrea Mauri Andrea.Mauri@polimi.it
http://crowdsearcher.search-computing.org/

References
• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Pattern-
Based Speci cation of Crowdsourcing Applications. In Proceedings of the 14th International
Conference of Web Engineering (ICWE 2014), 218 - 235
• Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Community-based
Crowdsourcing. In The 2nd International Workshop on the Theory and Practice of Social Machines.
Proceedings of the 23nd International Conference on World Wide Web (Companion Volume).
• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri. 2013. Reactive Crowdsourcing. In
Proceedings of the 22nd International Conference on World Wide Web (WWW 2013).
• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, Giuliano Vesci. 2013. Choosing the
right crowd: expert finding in social networks. In Proceedings of 16th International Conference on
Extending Database Technology (EDBT 2013). ACM, USA, 637-648.
• Alessandro Bozzon, Marco Brambilla, and Stefano Ceri. 2012. Answering search queries with
CrowdSearcher. In Proceedings of the 21st international conference on World Wide Web (WWW '12).
ACM, New York, NY, USA, 1009-1018.
• Alessandro Bozzon, Marco Brambilla, Andrea Mauri. 2012. A Model-Driven Approach for

Community Profiling for Crowdsourcing Queries

More Related Content

Similar to Community Profiling for Crowdsourcing Queries

Recently uploaded

Community Profiling for Crowdsourcing Queries