CHOOSING     THE RIGHT CROWDEXPERT FINDING IN SOCIAL NETWORKS                    Alessandro Bozzon                     Mar...
Problems and terms• Human Computation:  • Computation carried out by groups of humans (examples: collaborative    filterin...
The market
Why Crowd-search?• People do not trust web search completely  • Want to get direct feedback from people  • Expect recommen...
And given that crowds spend times onsocial networks….• Our proposal is to use social networks and Q&A websites as crowd-se...
From social workers to communities• Issues and problems • Motivation of the responders • Intensity of social activity of t...
Crowd-searching after conventionalsearch• From search results to friends and experts feedback         initial query       ...
Example: Find your next job (exploration)
Example: Find your job (social invitation)
Example: Find your job (social invitation)Selected data itemscan be transferredto the crowd question
Find your job (response submission)
Crowdsearcher results (in the loop)
WWW2012 – THE MODEL
Task management problemsTypical crowdsourcing problems:• Task splitting: the input data collection is too complex relative...
Task Design• Which are the input objects of the crowd interaction?  • Should they have a schema (set of fields, each defin...
Operations• In a Task, performers are required to execute logical operations on input objects   • e.g. Locate the faces of...
Splitting Strategy• Given N objects in the task  • Which objects should appear in each MicroTask?  • How many objects in e...
Assignment Strategy• Given a set of MicroTasks, which performers are assigned to them?• Online assignment  • Micro Tasks d...
Deployment: search on the social network• Multi-platform deployment                        Generated query template       ...
Deployment: search on the social network• Multi-platform deployment
Deployment: search on the social network• Multi-platform deployment
Deployment: search on the social network• Multi-platform deployment
Deployment: search on the social network• Multi-platform deployment
Crowdsearch experiments• Some 150 users• Two classes of experiments:  • Random questions on fixed topics: interests (e.g. ...
Experiments: Manual and randomquestions
Experiments: Interest and relationship• Manually written and assigned questions are consistently more responded in time
Experiments: Query type• Engagement depends on the difficulty of the task• Like vs. Add tasks:
Experiment: Social platform• The question enactment platform role• Facebook vs. Doodle
Experiment: Posting time• The question enactment platform role• Facebook vs. Doodle
EDBT 2013
Problem• Ranking the members of a social group according  to the level of knowledge that they have about a  given topic• A...
Considered Features• User Profiles  • Plus Linked Web Pages• Social Relationships  • Facebook Friendship  • Twitter mutual...
Feature Organization Meta-Model
Example (Facebook)
Example (Twitter)
Resource Distance    • Objects in social graph organized according to their     distance with respect to the user profile ...
Distance interpretationDistance   Resource0          Expert Candidate Profile           Expert Candidate owns/create/annot...
Resource Processing                             URL                             Entity                            Content ...
Method – Resource Score                                                        Entity Component                           ...
Method: Expert Score                                        Resource weight for                    Window Size           g...
Dataset• 7 kinds of expertises   • Computer Engineering, Location, Movies & TV, Music, Science,     Sport, Technology & Vi...
Distribution of Expertise and Resources                                                      • Avg Expertise ~ 3.5 / 7    ...
Metrics• We obtain lists of candidate experts and assess them against the ground truth, using:  • For precision:    • Mean...
Metrics improves with resources• But it comes with a cost
Friendship Relationship not useful• Inspecting friend’s resources does not improve metrics!
Social Network Analysis                                          •a Comparison of the results obtained with All the social...
Main Results• Profiles are less effective than level-1 resources  • Resources produced by others help in describing each i...
WWW 2013Reactive Crowdsourcing
Main Message• Crowd-sourcing should be dynamically adapted• The best way to do so is through active rules• Four kinds of r...
Upcoming SlideShare
Loading in …5
×

Choosing the right crowd. Expert finding in social networks. edbt 2013

4,698 views

Published on

Choosing the right crowd. Expert finding in social networks. edbt 2013

  1. 1. CHOOSING THE RIGHT CROWDEXPERT FINDING IN SOCIAL NETWORKS Alessandro Bozzon Marco Brambilla Stefano Ceri Matteo Silvestri Giuliano Vesci Politecnico di Milano Dipartimento di Elettronica, Informazione e BioIngegneria
  2. 2. Problems and terms• Human Computation: • Computation carried out by groups of humans (examples: collaborative filtering, online auctions, tagging, games with a purpose)• Crowd-sourcing: • The process of building a human computation using computers as organizers, by organizing the computation as several tasks (possibly with dependences) performed by humans• Crowd-searching: • A specific task consisting of searching information• Crowd-sourcing Platform: • A software system for managing tasks, capable of organizing tasks, assigning them to humans, assembling and processing returned results (such as Amazon Mechanical Turk, Doodle)• Social Platform: • A platform where humans perform social interactions (such as Facebook, Twitter, LinkedIn)
  3. 3. The market
  4. 4. Why Crowd-search?• People do not trust web search completely • Want to get direct feedback from people • Expect recommendations, insights, opinions, reassurance
  5. 5. And given that crowds spend times onsocial networks….• Our proposal is to use social networks and Q&A websites as crowd-searching platforms, in addition to crowdsourcing platforms• Example: search tasks Query Answer Query Interface Search Execution Social Human Interaction Engine Networks Management Q&A Local Human SE Access Source Crowd- Access Interface Access source Interface Interface platforms
  6. 6. From social workers to communities• Issues and problems • Motivation of the responders • Intensity of social activity of the asker • Topic appropriateness • Timing of the post (hour of the day, day of the week) • Context and language barrier
  7. 7. Crowd-searching after conventionalsearch• From search results to friends and experts feedback initial query Human Search System Search System Social Platform Social Platform Social Platform
  8. 8. Example: Find your next job (exploration)
  9. 9. Example: Find your job (social invitation)
  10. 10. Example: Find your job (social invitation)Selected data itemscan be transferredto the crowd question
  11. 11. Find your job (response submission)
  12. 12. Crowdsearcher results (in the loop)
  13. 13. WWW2012 – THE MODEL
  14. 14. Task management problemsTypical crowdsourcing problems:• Task splitting: the input data collection is too complex relative to the cognitive capabilities of users.• Task structuring: the query is too complex or too critical to be executed in one shot.• Task routing: a query can be distributed according to the values of some attribute of the collection.Plus:• Platform/community assignment: a task can be assigned to different communities or social platforms based on its focus
  15. 15. Task Design• Which are the input objects of the crowd interaction? • Should they have a schema (set of fields, each defined by a name and a type)?• Which operations should the crowd perform? • Like, label, comment, add new instances, verify/modify data, order, etc.• How should the task be split into micro-tasks assigned to each person? How should a specific object be assigned to each person?• How should the results of the micro-tasks be aggregated? • Sum, Average, Majority voting, etc.• Which execution interface should be used?
  16. 16. Operations• In a Task, performers are required to execute logical operations on input objects • e.g. Locate the faces of the people appearing in the following 5 images• CrowdSearcher offers pre-defined operation types: • Like: Ask a performer to express a preference (true/false) • e.g. Do you like this picture? • Comment: Ask a performer to write a description / summary / evaluation • e.g. Can you summarize the following text using your own words? • Tag: Ask a performer to annotate an object with a set of tags • e.g. How would you label the following image? • Classify: Ask a performer to classify an object within a closed-set of alternatives • e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema • e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano? • Modify: Ask a performer to verify/modify the content of one or more input object • e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects • e.g. Order the following books according to your taste
  17. 17. Splitting Strategy• Given N objects in the task • Which objects should appear in each MicroTask? • How many objects in each MicroTask? • How often an object should appear in MicroTasks? • Which objects cannot appear together? • Should objects be presented always in the same order?
  18. 18. Assignment Strategy• Given a set of MicroTasks, which performers are assigned to them?• Online assignment • Micro Tasks dynamically assigned to performers • First come / First served • Based on a choice of the performer• Offline assignment • MicroTasks statically assigned to performers • Based on performers’ priority • Based on matching• Invitation • Send an email to a mailing list • Publish a HIT on Mechanical Turk (dynamic) • Create a new challenge in your game • Publish a post/tweet on your social network profile • Publish a post/tweet on your friends profile
  19. 19. Deployment: search on the social network• Multi-platform deployment Generated query template Embedded External Native Standalone application application application API Social/ Crowd platform Native Embedding behaviours Community / Crowd
  20. 20. Deployment: search on the social network• Multi-platform deployment
  21. 21. Deployment: search on the social network• Multi-platform deployment
  22. 22. Deployment: search on the social network• Multi-platform deployment
  23. 23. Deployment: search on the social network• Multi-platform deployment
  24. 24. Crowdsearch experiments• Some 150 users• Two classes of experiments: • Random questions on fixed topics: interests (e.g. restaurants in the vicinity of Politecnico), to famous 2011 songs, or to top-quality EU soccer teams • Questions independently submitted by the users• Different invitation strategies: • Random invitation • Explicit selection of responders by the asker• Outcome • 175 like and insert queries • 1536 invitations to friends • 230 answers • 95 questions (~55%) got at least one answer
  25. 25. Experiments: Manual and randomquestions
  26. 26. Experiments: Interest and relationship• Manually written and assigned questions are consistently more responded in time
  27. 27. Experiments: Query type• Engagement depends on the difficulty of the task• Like vs. Add tasks:
  28. 28. Experiment: Social platform• The question enactment platform role• Facebook vs. Doodle
  29. 29. Experiment: Posting time• The question enactment platform role• Facebook vs. Doodle
  30. 30. EDBT 2013
  31. 31. Problem• Ranking the members of a social group according to the level of knowledge that they have about a given topic• Application: crowd selection (for Crowd Searching or Sourcing)• Available data • User profile • behavioral trace that users leave behind them through their social activities
  32. 32. Considered Features• User Profiles • Plus Linked Web Pages• Social Relationships • Facebook Friendship • Twitter mutual following relationship • LinkedIn Connections• Resource Containers • Groups, Facebook Pages • Linked Pages • Users who are followed by a given user are resource containers• Resources • Material published in resource containers
  33. 33. Feature Organization Meta-Model
  34. 34. Example (Facebook)
  35. 35. Example (Twitter)
  36. 36. Resource Distance • Objects in social graph organized according to their distance with respect to the user profile • Why? Privacy, Computational Cost, Platform Access ConstraintsDistance Resource0 Expert Candidate Profile Expert Candidate owns/create/annotates Resource1 Expert Candidate relatedTo Resource Container Expert Candidate follows UserProfile Expert Candidate follows UserProfile relatedTo Resource Container Expert Candidate relatedTo Resource Container contains Resource2 Expert Candidate follows UserProfile owns/create/annotates Resource Expert Candidate follows UserProfile follows UserProfile
  37. 37. Distance interpretationDistance Resource0 Expert Candidate Profile Expert Candidate owns/create/annotates Resource1 Expert Candidate relatedTo Resource Container Expert Candidate follows UserProfile Expert Candidate follows UserProfile relatedTo Resource Container Expert Candidate relatedTo Resource Container contains Resource2 Expert Candidate follows UserProfile owns/create/annotates Resource Expert Candidate follows UserProfile follows UserProfile
  38. 38. Resource Processing URL Entity Content Recognition and Extraction Disambiguation Resource Language Extraction Identification Text Processing• Extraction from Social • Text Processing Network APIs • Sanitization, tokenization, stopword, lemmatization• Extraction of Text from linked Web Pages • Alchemy Text Extraction APIs • Entity Extraction and Disambiguation• Language Identification • TagMe
  39. 39. Method – Resource Score Entity Component Weighting• tf(t,r)  term frequency -- irf(t)  inverse resource frequency of t• ef(e,r) entity frequency -- eir(e)  inverse entity frequency of e we(e,r)  relevance of entity in resource
  40. 40. Method: Expert Score Resource weight for Window Size given expertise Resource Score• Experts are ranked according to score(q,ex)
  41. 41. Dataset• 7 kinds of expertises • Computer Engineering, Location, Movies & TV, Music, Science, Sport, Technology & Videogames• 40 volunteer users (on Facebook & Twitter & LinkedIN)• 330.000 resources (70% with URL to external resources)• Groundtruth created trough self-assessment • For expertise need, vote on 7 Likert Scale • EXPERTS  expertise above average
  42. 42. Distribution of Expertise and Resources • Avg Expertise ~ 3.5 / 7 • High Music and Sport Expertise • Low Location Expertise 30 Experts Expertise 7.0 # Dom Experts Avg Dom Expertise 25 Avg # Experts Avg Expertise 6.5 6.0 Avg. Expertise 20 5.5 # Experts 15 5.0 4.5 10 4.0 3.5• High # Resources on Facebook 5 3.0 and Twiitter 2.5 0 Co L M S M S T mp oca ovie cien usic port echn• Higher # users on Facebook ute tion r ce o lo gy Domains
  43. 43. Metrics• We obtain lists of candidate experts and assess them against the ground truth, using: • For precision: • Mean Average Precision (MAP) • 11-Point Interpolated Average Precision (11-P) • For ranking: • Mean Reciprocal Rank (MRR) – for the first value • Normalized Discounted Cumulative Gain (DCG) – for more values, can be set @N for the first N values
  44. 44. Metrics improves with resources• But it comes with a cost
  45. 45. Friendship Relationship not useful• Inspecting friend’s resources does not improve metrics!
  46. 46. Social Network Analysis •a Comparison of the results obtained with All the social networks, or separately by FaceBook, TWitter, and LinkedIn.
  47. 47. Main Results• Profiles are less effective than level-1 resources • Resources produced by others help in describing each individual’s expertise• Twitter is the most effective social network for expertise matching – sometimes it outperforms the other social networks • Twitter most effective in Computer Engineering, Science, Technology & Games, Sport• Facebook effective in Locations, Sport, Movies & TV, Music• Linked-in never very helpful in locating expertise
  48. 48. WWW 2013Reactive Crowdsourcing
  49. 49. Main Message• Crowd-sourcing should be dynamically adapted• The best way to do so is through active rules• Four kinds of rules: execution / object / performer / task control • Guaranteed termination EXECUTION • Extensibility OBJECT PERFORMER TASK CONTROL CONTROL CONTROL OBJECT PERFORMER TASK Control Production Rules Result Production Rules Execution Modifier Rules

×