COMMUNITY PROFILING FOR 
CROWDSOURCING QUERIES 
Khalid Belhajjame 1 
Marco Brambilla 2 
Daniela Grigori 1 
Andrea Mauri 2 
1 PSL, Paris-Dauphine University, LAMSADE, France 
2 Politecnico di Milano, Italy
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 2 
Traditional vs Community Crowdsourcing 
• General structure: 
• the requestor poses some questions 
• a wide set of responders are in charge of providing answers 
(typically unknown to the requestor) 
• the system organizes a response collection campaign 
• Traditional Crowdsourcing 
• Cost – Quality Tradeoff 
• Complex results aggregation 
• Community Crowdsourcing 
• Matching the task to the “correct” group of workers
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 3 
Community 
Set of people that share 
• Interests 
• Feature 
..or belong to 
• common entity 
• social network
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 4 
Leveraging communities 
• Why? 
• Experts 
• More engaged 
• How? 
• Determine the communities of performers 
• Target the correct community 
• Monitor them taking into the account the behavior of their members
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 5 
The approach 
• Models 
• Query Model 
• Community Model 
• Matching strategies 
• Keyword based 
• Semantic based
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 6 
Query Model 
• Textual description of the task 
• Examples of query and responses 
• Knowledge needed 
• Prior knowledge (knowledge base) that can be used for 
partially answering or for identifying potential answers. 
• Type of the task 
• Unary: tag, classify, like, … 
• N-ary: match, cluster, … 
• Objects 
• Kind, description, text, metadata … 
• Temporal
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 7 
Community Model 
• Textual description of the community 
• name, web page, … 
• Type of the community 
• Explicit: Statically existing and consolidated 
• Implicit: Dynamically built based on the need 
• Definition 
• Intentional: defined by a property 
• Extensional: list of members 
• Both 
• Grouping factor 
• Friendship, interest, location, expertise, affiliation
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 8 
Community Model 
• Content 
• Produced by the people of the community 
• Members’ profiles 
• Explicit 
• Implicit 
• Communication channel 
• Email, facebook, linkedin, twitter, blogs or web sites (reviews, 
expert sites), AMT
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 9 
Relations between Communities 
• Subsumption 
• A given community contains another community 
• i.e. Sport fans contains Soccer fans 
• Similarity 
• Two communities refer to similar expertise or topic 
• i.e. Experts in classical music and experts on opera
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 10 
Matching 
• Keyword Based 
• Communities and query treated as bag of words 
• Requires indexing 
• Semantic Based 
• Communities and query are mapped to concepts 
• Requires semantic annotation
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 11 
Community Control 
Community control consists in the adaptation of the 
crowdsourcing campaign according to the behavior of the 
community 
• Task / Object allocation (granularity) 
• Static / Dynamic 
SOCM’14, Monday, April 7
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 
CrowdSearcher 
A prototype that allows 
the definition, 
execution and control 
of a crowdsourcing 
campaign 
SOCM’14, Monday, April 7 
12 
http://crowdsearcher.search-computing.org/
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 
Example (dynamic control) 
CompExecs 
TaskID 
Materials 
NonRelevant 
PeoplePlace 
Image Task 
μTObjExecution 
Performer 
Status 
StartTs 
EndTs 
μTaskID 
Object 
Control 
CompObjs 
Task 
Control 
Name 
Performer 
Control 
PerformerID 
Score 
ProfPhoto 
TaskID 
ObjectID 
ImgUrl 
Correct 
Category 
ObjectID 
Answer 
Eval 
ProfessorID 
ObjectID 
TaskID 
PerformerID 
PerformerID 
Community CommunityID 
Name 
Community 
Control 
CommunityID 
Score 
CommunityID 
Enabled 
Status 
Status 
LastExec 
Tests 
Execs 
SOCM’14, Monday, April 7 
13 
e: AFTER UPDATE FOR μTObjExecution 
c: CommunityControl[CommunityID== NEW.CommunityID].score<=0.5 
CommunityControl[CommunityID== NEW.CommunityID].eval=10 
a: SET CommunityControl[CommunityID == DB-Group].Enabled = true 
?
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 14 
Experiment 
• 16 professors within two 
research groups in our 
department (DB and AI 
groups) 
• The top 50 images 
returned by the Google 
Image API for each query 
• Each experts have to 
evaluate 5 images at time 
• Results are accepted 
when enough agreement 
on the class of the image 
is reached 
• Evaluated objects are 
removed from new 
executions. 
SOCM’14, Monday, April 7
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 15 
Communities 
The communities: 
• the research group of the professor, 
• the research area containing the group (e.g. Computer 
Science) 
• and the whole department (which accounts for more than 600 
people in different areas) 
Invitations are sent: 
• inside-out: we started with invitations to experts, e.g. people 
the same groups as the professor (DB and AI), and then 
expanded invitations to Computer Science, then to the whole 
Department, and finally to open social networks (Alumni and 
PhDs communities on Facebook and Linkedin); 
• outside-in: we proceeded in the opposite way, starting with the 
Department members, then restricting to Computer Scientists, 
and finally to the group's members. 
SOCM’14, Monday, April 7
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 
Number of performers per community 
70" 
60" 
50" 
40" 
30" 
20" 
10" 
SOCM’14, Monday, April 7 
Community-based Crowdsourcing 
16 
0" 
9 / “a lot” 
7/18/2013" 7/19/2013" 7/20/2013" 7/21/2013" 7/22/2013" 7/23/2013" 7/24/2013" 7/25/2013" 7/26/2013" 7/27/2013" 7/28/2013" 
#"Performers" 
Time" 
research"group" 
research"area" 
department" 
social"network" 
total" 
46% 
24% 
16%
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 
Precision of performers per community 
1" 
0.9" 
0.8" 
0.7" 
0.6" 
0.5" 
0.4" 
0.3" 
0.2" 
0.1" 
SOCM’14, Monday, April 7 
17 
0" 
0" 500" 1000" 1500" 2000" 2500" 3000" 
Precision) 
#Evalua0ons) 
research"group" 
research"area" 
department" 
social"network" 
total"
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 
Precision of the evaluated objects 
• Precision decreases with less expert communities 
• Inside-out strategy (from expert to generic users) outperforms 
Outside-in strategy (from generic to expert users) 
1$ 
0.95$ 
0.9$ 
0.85$ 
0.8$ 
0.75$ 
0.7$ 
0.65$ 
SOCM’14, Monday, April 7 
18 
0.6$ 
0$ 100$ 200$ 300$ 400$ 500$ 600$ 700$ 800$ 
Precision) 
#Closed)Objects) 
precision$(main$experiment)$ 
precision$(reverse$invita<ons)$
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 19 
General observations 
A given community of workers can be broken down into 
(possibly overlapping) sub-communities with different 
expertise 
Experts from community feel more engaged with the task 
• They are more demanding with respect to the quality of 
the application UI and the evaluated objects 
• Provide feedbacks on the application, question and the 
objects evaluated 
• “How is it possible that this image is related to me?!” 
SOCM’14, Monday, April 7
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 20 
Conclusions 
• Communities can be effectively used for tasks that require 
domain expertise 
• How to deal with tasks requiring multiple expertise 
• How to build a knowledge base that allows profiling of 
both communities and queries in a optimal way 
• How to cope with the dynamics over time of 
• Communities and task (changing needs) 
• Communities and worker expertise
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 21 
Thanks for your attention 
Any Question? 
Contacts 
Khalid Belhajjame Khalid.Belhajjame@dauphine.fr 
Marco Brambilla Marco.Brambilla@polimi.it 
Daniela Grigori Daniela.Grigori@dauphine.fr 
Andrea Mauri Andrea.Mauri@polimi.it 
http://crowdsearcher.search-computing.org/
Monday 15th September 2014 Community Profiling for Crowdsourcing Queries 22 
References 
• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Pattern- 
Based Speci cation of Crowdsourcing Applications. In Proceedings of the 14th International 
Conference of Web Engineering (ICWE 2014), 218 - 235 
• Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Community-based 
Crowdsourcing. In The 2nd International Workshop on the Theory and Practice of Social Machines. 
Proceedings of the 23nd International Conference on World Wide Web (Companion Volume). 
• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri. 2013. Reactive Crowdsourcing. In 
Proceedings of the 22nd International Conference on World Wide Web (WWW 2013). 
• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, Giuliano Vesci. 2013. Choosing the 
right crowd: expert finding in social networks. In Proceedings of 16th International Conference on 
Extending Database Technology (EDBT 2013). ACM, USA, 637-648. 
• Alessandro Bozzon, Marco Brambilla, and Stefano Ceri. 2012. Answering search queries with 
CrowdSearcher. In Proceedings of the 21st international conference on World Wide Web (WWW '12). 
ACM, New York, NY, USA, 1009-1018. 
• Alessandro Bozzon, Marco Brambilla, Andrea Mauri. 2012. A Model-Driven Approach for

Community Profiling for Crowdsourcing Queries

  • 1.
    COMMUNITY PROFILING FOR CROWDSOURCING QUERIES Khalid Belhajjame 1 Marco Brambilla 2 Daniela Grigori 1 Andrea Mauri 2 1 PSL, Paris-Dauphine University, LAMSADE, France 2 Politecnico di Milano, Italy
  • 2.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 2 Traditional vs Community Crowdsourcing • General structure: • the requestor poses some questions • a wide set of responders are in charge of providing answers (typically unknown to the requestor) • the system organizes a response collection campaign • Traditional Crowdsourcing • Cost – Quality Tradeoff • Complex results aggregation • Community Crowdsourcing • Matching the task to the “correct” group of workers
  • 3.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 3 Community Set of people that share • Interests • Feature ..or belong to • common entity • social network
  • 4.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 4 Leveraging communities • Why? • Experts • More engaged • How? • Determine the communities of performers • Target the correct community • Monitor them taking into the account the behavior of their members
  • 5.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 5 The approach • Models • Query Model • Community Model • Matching strategies • Keyword based • Semantic based
  • 6.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 6 Query Model • Textual description of the task • Examples of query and responses • Knowledge needed • Prior knowledge (knowledge base) that can be used for partially answering or for identifying potential answers. • Type of the task • Unary: tag, classify, like, … • N-ary: match, cluster, … • Objects • Kind, description, text, metadata … • Temporal
  • 7.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 7 Community Model • Textual description of the community • name, web page, … • Type of the community • Explicit: Statically existing and consolidated • Implicit: Dynamically built based on the need • Definition • Intentional: defined by a property • Extensional: list of members • Both • Grouping factor • Friendship, interest, location, expertise, affiliation
  • 8.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 8 Community Model • Content • Produced by the people of the community • Members’ profiles • Explicit • Implicit • Communication channel • Email, facebook, linkedin, twitter, blogs or web sites (reviews, expert sites), AMT
  • 9.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 9 Relations between Communities • Subsumption • A given community contains another community • i.e. Sport fans contains Soccer fans • Similarity • Two communities refer to similar expertise or topic • i.e. Experts in classical music and experts on opera
  • 10.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 10 Matching • Keyword Based • Communities and query treated as bag of words • Requires indexing • Semantic Based • Communities and query are mapped to concepts • Requires semantic annotation
  • 11.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 11 Community Control Community control consists in the adaptation of the crowdsourcing campaign according to the behavior of the community • Task / Object allocation (granularity) • Static / Dynamic SOCM’14, Monday, April 7
  • 12.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries CrowdSearcher A prototype that allows the definition, execution and control of a crowdsourcing campaign SOCM’14, Monday, April 7 12 http://crowdsearcher.search-computing.org/
  • 13.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries Example (dynamic control) CompExecs TaskID Materials NonRelevant PeoplePlace Image Task μTObjExecution Performer Status StartTs EndTs μTaskID Object Control CompObjs Task Control Name Performer Control PerformerID Score ProfPhoto TaskID ObjectID ImgUrl Correct Category ObjectID Answer Eval ProfessorID ObjectID TaskID PerformerID PerformerID Community CommunityID Name Community Control CommunityID Score CommunityID Enabled Status Status LastExec Tests Execs SOCM’14, Monday, April 7 13 e: AFTER UPDATE FOR μTObjExecution c: CommunityControl[CommunityID== NEW.CommunityID].score<=0.5 CommunityControl[CommunityID== NEW.CommunityID].eval=10 a: SET CommunityControl[CommunityID == DB-Group].Enabled = true ?
  • 14.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 14 Experiment • 16 professors within two research groups in our department (DB and AI groups) • The top 50 images returned by the Google Image API for each query • Each experts have to evaluate 5 images at time • Results are accepted when enough agreement on the class of the image is reached • Evaluated objects are removed from new executions. SOCM’14, Monday, April 7
  • 15.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 15 Communities The communities: • the research group of the professor, • the research area containing the group (e.g. Computer Science) • and the whole department (which accounts for more than 600 people in different areas) Invitations are sent: • inside-out: we started with invitations to experts, e.g. people the same groups as the professor (DB and AI), and then expanded invitations to Computer Science, then to the whole Department, and finally to open social networks (Alumni and PhDs communities on Facebook and Linkedin); • outside-in: we proceeded in the opposite way, starting with the Department members, then restricting to Computer Scientists, and finally to the group's members. SOCM’14, Monday, April 7
  • 16.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries Number of performers per community 70" 60" 50" 40" 30" 20" 10" SOCM’14, Monday, April 7 Community-based Crowdsourcing 16 0" 9 / “a lot” 7/18/2013" 7/19/2013" 7/20/2013" 7/21/2013" 7/22/2013" 7/23/2013" 7/24/2013" 7/25/2013" 7/26/2013" 7/27/2013" 7/28/2013" #"Performers" Time" research"group" research"area" department" social"network" total" 46% 24% 16%
  • 17.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries Precision of performers per community 1" 0.9" 0.8" 0.7" 0.6" 0.5" 0.4" 0.3" 0.2" 0.1" SOCM’14, Monday, April 7 17 0" 0" 500" 1000" 1500" 2000" 2500" 3000" Precision) #Evalua0ons) research"group" research"area" department" social"network" total"
  • 18.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries Precision of the evaluated objects • Precision decreases with less expert communities • Inside-out strategy (from expert to generic users) outperforms Outside-in strategy (from generic to expert users) 1$ 0.95$ 0.9$ 0.85$ 0.8$ 0.75$ 0.7$ 0.65$ SOCM’14, Monday, April 7 18 0.6$ 0$ 100$ 200$ 300$ 400$ 500$ 600$ 700$ 800$ Precision) #Closed)Objects) precision$(main$experiment)$ precision$(reverse$invita<ons)$
  • 19.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 19 General observations A given community of workers can be broken down into (possibly overlapping) sub-communities with different expertise Experts from community feel more engaged with the task • They are more demanding with respect to the quality of the application UI and the evaluated objects • Provide feedbacks on the application, question and the objects evaluated • “How is it possible that this image is related to me?!” SOCM’14, Monday, April 7
  • 20.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 20 Conclusions • Communities can be effectively used for tasks that require domain expertise • How to deal with tasks requiring multiple expertise • How to build a knowledge base that allows profiling of both communities and queries in a optimal way • How to cope with the dynamics over time of • Communities and task (changing needs) • Communities and worker expertise
  • 21.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 21 Thanks for your attention Any Question? Contacts Khalid Belhajjame Khalid.Belhajjame@dauphine.fr Marco Brambilla Marco.Brambilla@polimi.it Daniela Grigori Daniela.Grigori@dauphine.fr Andrea Mauri Andrea.Mauri@polimi.it http://crowdsearcher.search-computing.org/
  • 22.
    Monday 15th September2014 Community Profiling for Crowdsourcing Queries 22 References • Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Pattern- Based Speci cation of Crowdsourcing Applications. In Proceedings of the 14th International Conference of Web Engineering (ICWE 2014), 218 - 235 • Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Community-based Crowdsourcing. In The 2nd International Workshop on the Theory and Practice of Social Machines. Proceedings of the 23nd International Conference on World Wide Web (Companion Volume). • Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri. 2013. Reactive Crowdsourcing. In Proceedings of the 22nd International Conference on World Wide Web (WWW 2013). • Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, Giuliano Vesci. 2013. Choosing the right crowd: expert finding in social networks. In Proceedings of 16th International Conference on Extending Database Technology (EDBT 2013). ACM, USA, 637-648. • Alessandro Bozzon, Marco Brambilla, and Stefano Ceri. 2012. Answering search queries with CrowdSearcher. In Proceedings of the 21st international conference on World Wide Web (WWW '12). ACM, New York, NY, USA, 1009-1018. • Alessandro Bozzon, Marco Brambilla, Andrea Mauri. 2012. A Model-Driven Approach for