Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CROWDSEARCHER
Marco Brambilla, Stefano Ceri,
Andrea Mauri, Riccardo Volonterio
Politecnico di Milano
Dipartimento di Elett...
Crowd-based Applications
• Emerging crowd-based applications:
• opinion mining
• localized information gathering
• marketi...
The “system” is a wide concept
• Crowd-based applications may use social networks and Q&A
websites in addition to crowdsou...
CrowdSearcher
• Combines a conceptual framework, a specification
paradigm and a reactive execution control environment
• S...
An example of crowd-based application:
crowd-search
• People do not trust web search completely
• Want to get direct feedb...
Crowd-searching after conventional
search
• From search results to friends and experts feedback
Social Platform
initial qu...
Example: Find your next job (exploration)
Crowdsearcher 9
Example: Find your job (social invitation)
Crowdsearcher 10
Example: Find your job (social invitation)
Selected data items
can be transferred
to the crowd question
Crowdsearcher 11
Find your job (response submission)
Crowdsearcher 12
Crowdsearcher results (in the loop)
Crowdsearcher 13
Deployment alternatives
• Multi-platform deployment
Embedded
application
Social/ Crowd platform
Native
behaviours
External...
Deployment: search on a social network
• Multi-platform deployment
Crowdsearcher 15
Deployment: search on the social network
• Multi-platform deployment
Crowdsearcher 16
Deployment: search on the social network
• Multi-platform deployment
Crowdsearcher 17
Deployment: search on the social network
• Multi-platform deployment
Crowdsearcher 18
From social workers to communities
• Issues and problems
• Motivation of the responders
• Intensity of social activity of ...
THE MODELAND
THE PROCESS
Crowdsearcher 20
• A simple task design and deployment process, based on specific data
structures
• created using model-driven transformati...
Task Specification
• Which are the input objects of the crowd interaction?
• Do they have a schema (record of named and ty...
Operations
• In a Task, performers are required to execute logical operations on input objects
• e.g. Locate the faces of ...
Task planning
Typical problems:
• Task structuring: the task is too complex or too critical to
be executed as a single ope...
Micro Tasks
• The actual unit of interaction with a performer.
• Mapping of objects to Micro Tasks:
• How many objects in ...
Assignment Strategy
• Given a set of MicroTasks, which performers are
assigned to them?
• Pull vs Push:
• Pull: The perfor...
Invitation Strategy
• The process of inviting performers to perform Micro Tasks
• Can use very different mechanisms
• Esse...
Steps in Crodw-based Application Design
1) Task Design
2) Object and Performer Design
3) Micro Task Design
Step 1. Task Design
Crowdsearcher 29
Step 2: Object and Performer Design
Step 3: MicroTask Design
Crowdsearcher 31
Complete Meta-Model
Crowdsearcher 32
Design Tool: Screenshot
Crowdsearcher 33
Application instatiation (for Italian Politics)
• Given the picture and name of a politician, specify his/her political
af...
REACTIVITY AND
MULTIPLATFORM
Crowdsearcher 35
Crowd Control is tough…
• There are several aspects that makes crowd
engineering complicated
• Task design, planning, assi...
Crowd Control is tough…
• There are several aspects that makes crowd
engineering complicated
• Task design, planning, assi...
Reactive Crowdsourcing
• A conceptual framework for controlling the execution of
crowd-based computations. Based on:
• Con...
Why Active Rules?
• Ease of Use: control is easily expressible
• Simple formalism, simple computation
• Power: arbitrarily...
Control Mart
• Data structure for controlling application execution, inspired by data
marts (for data warehousing); conten...
Auxiliary Structures
• Object : tracking object responses
• Performer: tracking performer behavior (e.g. spammers)
• Task:...
Active Rules Language
• Active rules are expressed on the previous data
structures
• Event-Condition-Action paradigm
Crowd...
Active Rules Language
• Active rules are expressed on the previous data
structures
• Event-Condition-Action paradigm
• Eve...
Active Rules Language
• Active rules are expressed on the previous data
structures
• Event-Condition-Action paradigm
• Eve...
Active Rules Language
• Active rules are expressed on the previous data
structures
• Event-Condition-Action paradigm
• Eve...
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW...
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW...
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
c: NEW.ClassifiedParty == ’Republican’
a: SET ObjectControl[oID == NEW...
Crowdsearcher 49
e: UPDATE FOR ObjectControl
c: (NEW.Rep== 2) or (NEW.Dem == 2)
a: SET Politician[oid==NEW.oid].classified...
Crowdsearcher 50
e: UPDATE FOR ObjectControl
c: (NEW.Rep== 2) or (NEW.Dem == 2)
a: SET Politician[oid==NEW.oid].classified...
Crowdsearcher 51
e: UPDATE FOR ObjectControl
c: (NEW.Rep== 2) or (NEW.Dem == 2)
a: SET Politician[oid==NEW.oid].classified...
Crowdsearcher 52
e: UPDATE FOR ObjectControl
c: (NEW.Rep== 2) or (NEW.Dem == 2)
a: SET Politician[oid==NEW.oid].classified...
Rule Programming Best Practice
• We define three classes of rules
Crowdsearcher 53
Rule Programming Best Practice
Crowdsearcher 54
• We define three classes of rules
• Control rules: modifying the control ...
Rule Programming Best Practice
Crowdsearcher 55
• We define three classes of rules
• Control rules: modifying the control ...
Rule Programming Best Practice
Crowdsearcher 56
• Top-to-bottom, left-to-right, evaluation
• Guaranteed termination
• We d...
Rule Programming Best Practice
• We define three classes of rules
• Control rules: modifying the control tables;
• Result ...
EXPERIMENTS
Crowdsearcher 58
Crowdsearcher Experiment 1
• Goal: Test engagement on social networks
• Some 150 users
• Two classes of experiments:
• Ran...
Manual and Random Questions
Crowdsearcher 60
Interest / Rewarding Factor
• Manually written and assigned questions
are consistently more responded in time
Crowdsearche...
Query Type
• Engagement depends on the difficulty of the task
• Like vs. Add tasks:
Crowdsearcher 62
Comparison of Execution Platforms
• Facebook vs. Doodle
Crowdsearcher 64
Posting Time
• Facebook vs. Doodle
Crowdsearcher 65
Crowdsearcher Experiment 2
• GOAL: demonstrate the flexibility and expressive power
of reactive crowdsourcing
• 3 experime...
Politician Affiliation
• Given the picture and name of a politician, specify his/her political
affiliation
• No time limit...
Results – Majority Evaluation_1/3
Crowdsearcher 68
30 object; object redundancy = 9;
Final object classification as simple...
Results - Majority Evaluation_2/3
Crowdsearcher 69
Final object classification as total majority after 3 evaluations
Other...
Results - Majority Evaluation_3/3
Crowdsearcher 70
Final object classification as total majority after 3 evaluations
Other...
Results – Spammer Detection_1/2
Crowdsearcher 71
New rule for spammer detection without ground truth
Performer correctness...
Results – Spammer Detection_1/2
Crowdsearcher 72
New rule for spammer detection without ground truth
Performer correctness...
EXPERT FINDING IN
CROWDSEARCHER
Crowdsearcher 73
Problem
• Ranking the members of a social group according
to the level of knowledge that they have about a
given topic
• A...
Considered Features
• User Profiles
• Plus Linked Web Pages
• Social Relationships
• Facebook Friendship
• Twitter mutual ...
Feature Organization Meta-Model
Crowdsearcher 76
Example (Facebook)
Crowdsearcher 77
Example (Twitter)
Crowdsearcher 78
Resource Distance
• Objects in social graph organized according to their
distance with respect to the user profile
• Why? ...
Distance interpretation
Distance Resource
0 Expert Candidate Profile
1
Expert Candidate owns/create/annotates Resource
Exp...
Resource Processing
• Extraction from Social
Network APIs
• Extraction of Text from linked
Web Pages
• Alchemy Text Extrac...
Dataset
• 7 kinds of expertises
• Computer Engineering, Location, Movies & TV, Music, Science,
Sport, Technology & Videoga...
Metrics
• We obtain lists of candidate experts and assess them
against the ground truth, using:
• For precision:
• Mean Av...
Metrics improves with resources
• But it comes with a cost
Crowdsearcher 87
Friendship Relationship not useful
• Inspecting friend’s resources does not improve metrics!
Crowdsearcher 88
Social Network Analysis
• a
Comparison of the results obtained with All the social networks, or separately by
FaceBook, TW...
Main Results
• Profiles are less effective than level-1 resources
• Resources produced by others help in describing each i...
CONCLUSIONS
Crowdsearcher 95
Summary
• Results
• An integrated framework for crowdsourcing task design and control
• Well-structured control rules with...
QUESTIONS?
Crowdsearcher 97
Upcoming SlideShare
Loading in …5
×

CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013

919 views

Published on

Published in: Technology
  • Be the first to comment

CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013

  1. 1. CROWDSEARCHER Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio Politecnico di Milano Dipartimento di Elettronica, Informazione e BioIngegneria Crowdsearcher 1
  2. 2. Crowd-based Applications • Emerging crowd-based applications: • opinion mining • localized information gathering • marketing campaigns • expert response gathering • General structure: • the requestor poses some questions • a wide set of responders are in charge of providing answers (typically unknown to the requestor) • the system organizes a response collection campaign • Include crowdsourcing and crowdsearching Crowdsearcher 2
  3. 3. The “system” is a wide concept • Crowd-based applications may use social networks and Q&A websites in addition to crowdsourcing platforms • Our approach: a coordination engine which keeps an overall control on the application deployment and execution Crowdsearcher 3 CrowdSearcher APIAccess
  4. 4. CrowdSearcher • Combines a conceptual framework, a specification paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring applications on top of crowd-based systems • Design is top-down, platform-independent • Deployment turns declarative specifications into platform-specific implementations which include social networks and crowdsourcing platforms • Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability • Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013) Crowdsearcher 4
  5. 5. An example of crowd-based application: crowd-search • People do not trust web search completely • Want to get direct feedback from people • Expect recommendations, insights, opinions, reassurance Crowdsearcher 7
  6. 6. Crowd-searching after conventional search • From search results to friends and experts feedback Social Platform initial query Human Search System Search System Social PlatformSocial Platform Crowdsearcher 8
  7. 7. Example: Find your next job (exploration) Crowdsearcher 9
  8. 8. Example: Find your job (social invitation) Crowdsearcher 10
  9. 9. Example: Find your job (social invitation) Selected data items can be transferred to the crowd question Crowdsearcher 11
  10. 10. Find your job (response submission) Crowdsearcher 12
  11. 11. Crowdsearcher results (in the loop) Crowdsearcher 13
  12. 12. Deployment alternatives • Multi-platform deployment Embedded application Social/ Crowd platform Native behaviours External application Standalone application API Embedding Community / Crowd Generated query template Native Crowdsearcher 14
  13. 13. Deployment: search on a social network • Multi-platform deployment Crowdsearcher 15
  14. 14. Deployment: search on the social network • Multi-platform deployment Crowdsearcher 16
  15. 15. Deployment: search on the social network • Multi-platform deployment Crowdsearcher 17
  16. 16. Deployment: search on the social network • Multi-platform deployment Crowdsearcher 18
  17. 17. From social workers to communities • Issues and problems • Motivation of the responders • Intensity of social activity of the asker • Topic appropriateness • Timing of the post (hour of the day, day of the week) • Context and language barrier Crowdsearcher 19
  18. 18. THE MODELAND THE PROCESS Crowdsearcher 20
  19. 19. • A simple task design and deployment process, based on specific data structures • created using model-driven transformations • driven by the task specification The Design Process Task Specification Task Planning Control Specification Crowdsearcher 21 • Task Specification: task operations, objects, and performers • Task Planning: work distribution • Control Specification: task control policies
  20. 20. Task Specification • Which are the input objects of the crowd interaction? • Do they have a schema (record of named and typed fields)? • Which operations should the crowd perform? • Like, label, comment, add new instances, verify/modify data, order, etc. • Who are the performers of the task? How should they be selected? And invited? • e.g. push vs pull model • Which quality criteria should be used for deciding the task outcome? • e.g., majority weighting, with/without spam detection • Which platforms should be used? Which execution interface should be used? Crowdsearcher 22
  21. 21. Operations • In a Task, performers are required to execute logical operations on input objects • e.g. Locate the faces of the people appearing in the following 5 images • CrowdSearcher offers pre-defined operation types: • Like: Ask a performer to express a preference (true/false) • e.g. Do you like this picture? • Comment: Ask a performer to write a description / summary / evaluation • e.g. Can you summarize the following text using your own words? • Tag: Ask a performer to annotate an object with a set of tags • e.g. How would you label the following image? • Classify: Ask a performer to classify an object within a closed-set of alternatives • e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema • e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano? • Modify: Ask a performer to verify/modify the content of one or more input object • e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects • e.g. Order the following books according to your taste Crowdsearcher 23
  22. 22. Task planning Typical problems: • Task structuring: the task is too complex or too critical to be executed as a single operation. • Task splitting: the input data collection is too large to be presented to a user. • Task routing: a query can be distributed according to the values of some attribute of the collection. Crowdsearcher 24
  23. 23. Micro Tasks • The actual unit of interaction with a performer. • Mapping of objects to Micro Tasks: • How many objects in each MicroTask? • Which objects should appear in each MicroTask? • How often an object should appear in MicroTasks? • Which objects cannot appear together? • Should objects be presented always in some order? Crowdsearcher 25
  24. 24. Assignment Strategy • Given a set of MicroTasks, which performers are assigned to them? • Pull vs Push: • Pull: The performer choses • Push: The performer is chosen • Online vs offline • Micro Tasks dynamically assigned to performers • First come / First served • Based on performer’s performance • MicroTasks statically assigned to performers • Based on performers’ priority • Based on matching Crowdsearcher 26
  25. 25. Invitation Strategy • The process of inviting performers to perform Micro Tasks • Can use very different mechanisms • Essential in order to generate the appropriate performer reaction / reward. • Examples: • Send an email to a mailing list • Publish a HIT on Mechanical Turk • Create a new challenge in your game • Publish a post/tweet on your social network profile • Publish a post/tweet on your friends' profile Crowdsearcher 27
  26. 26. Steps in Crodw-based Application Design 1) Task Design 2) Object and Performer Design 3) Micro Task Design
  27. 27. Step 1. Task Design Crowdsearcher 29
  28. 28. Step 2: Object and Performer Design
  29. 29. Step 3: MicroTask Design Crowdsearcher 31
  30. 30. Complete Meta-Model Crowdsearcher 32
  31. 31. Design Tool: Screenshot Crowdsearcher 33
  32. 32. Application instatiation (for Italian Politics) • Given the picture and name of a politician, specify his/her political affiliation • No time limit • Performers are encouraged to look up online • 2 set of rules • Majority Evaluation • Spammer Detection Crowdsearcher 34
  33. 33. REACTIVITY AND MULTIPLATFORM Crowdsearcher 35
  34. 34. Crowd Control is tough… • There are several aspects that makes crowd engineering complicated • Task design, planning, assignment • Workers discovery, assessment, engagement Crowdsearcher 36
  35. 35. Crowd Control is tough… • There are several aspects that makes crowd engineering complicated • Task design, planning, assignment • Workers discovery, assessment, engagement • Controlling crowdsourcing tasks is a fundamental issue • Cost • Time • Quality • Need for higher level abstrasction and tools Crowdsearcher 37
  36. 36. Reactive Crowdsourcing • A conceptual framework for controlling the execution of crowd-based computations. Based on: • Control Marts • Active Rules • Classical forms of controls: • Majority control (to close object computations) • Quality control (to check that quality constraints are met) • Spam detection (to detect / eliminate some performers) • Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers) Crowdsearcher 38
  37. 37. Why Active Rules? • Ease of Use: control is easily expressible • Simple formalism, simple computation • Power: arbitrarily complex controls is supported • Extensibility mechanisms • Automation: active rules can be system-generated • Well-defined semantics • Flexibility: localized impact of changes on the rules set • Control isolation • Known formal properties descending from known theory • Termination, confluence Crowdsearcher 39
  38. 38. Control Mart • Data structure for controlling application execution, inspired by data marts (for data warehousing); content is automatically built from task specification & planning • Central entity: MicroTask Object Execution • Dimensions: Task / Operations, Performer, Object Crowdsearcher 40 Task Specification Task Planning Control Specification
  39. 39. Auxiliary Structures • Object : tracking object responses • Performer: tracking performer behavior (e.g. spammers) • Task: tracking task status Crowdsearcher 41 Task Specification Task Planning Control Specification
  40. 40. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm Crowdsearcher 42
  41. 41. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD  before state of a row • NEW  after state of a row Crowdsearcher 43 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
  42. 42. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD  before state of a row • NEW  after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) Crowdsearcher 44 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’
  43. 43. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD  before state of a row • NEW  after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) • Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan) Crowdsearcher 45 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1
  44. 44. e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1 Crowdsearcher 46 Rule Example 1
  45. 45. e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1 Crowdsearcher 47 Rule Example 1
  46. 46. e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1 Crowdsearcher 48 Rule Example 1
  47. 47. Crowdsearcher 49 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  48. 48. Crowdsearcher 50 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  49. 49. Crowdsearcher 51 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  50. 50. Crowdsearcher 52 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  51. 51. Rule Programming Best Practice • We define three classes of rules Crowdsearcher 53
  52. 52. Rule Programming Best Practice Crowdsearcher 54 • We define three classes of rules • Control rules: modifying the control tables;
  53. 53. Rule Programming Best Practice Crowdsearcher 55 • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
  54. 54. Rule Programming Best Practice Crowdsearcher 56 • Top-to-bottom, left-to-right, evaluation • Guaranteed termination • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
  55. 55. Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning Crowdsearcher 57 • Termination must be proven (rule precedence graph has cycles)
  56. 56. EXPERIMENTS Crowdsearcher 58
  57. 57. Crowdsearcher Experiment 1 • Goal: Test engagement on social networks • Some 150 users • Two classes of experiments: • Random questions on fixed topics: interests (e.g. restaurants in the vicinity of Politecnico), to famous 2011 songs, or to top-quality EU soccer teams • Questions manually submitted by the users • Different invitation strategies: • Random invitation • Explicit selection of responders by the asker • Outcome • 175 like and insert queries • 1536 invitations to friends • 230 answers • 95 questions (~55%) got at least one answer Crowdsearcher 59
  58. 58. Manual and Random Questions Crowdsearcher 60
  59. 59. Interest / Rewarding Factor • Manually written and assigned questions are consistently more responded in time Crowdsearcher 61
  60. 60. Query Type • Engagement depends on the difficulty of the task • Like vs. Add tasks: Crowdsearcher 62
  61. 61. Comparison of Execution Platforms • Facebook vs. Doodle Crowdsearcher 64
  62. 62. Posting Time • Facebook vs. Doodle Crowdsearcher 65
  63. 63. Crowdsearcher Experiment 2 • GOAL: demonstrate the flexibility and expressive power of reactive crowdsourcing • 3 experiments, focused on Italian politicians • Parties: Human Computation  affiliation classification • Law: Game With a Purpose  guess the convicted politician • Order: Pure Game  hot or not • 1 week (November 2012) • 284 distinct performers • Recruited through public mailing lists and social networks announcements • 3500 Micro Tasks Crowdsearcher 66
  64. 64. Politician Affiliation • Given the picture and name of a politician, specify his/her political affiliation • No time limit • Performers are encouraged to look up online • 2 set of rules • Majority Evaluation • Spammer Detection Crowdsearcher 67
  65. 65. Results – Majority Evaluation_1/3 Crowdsearcher 68 30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations
  66. 66. Results - Majority Evaluation_2/3 Crowdsearcher 69 Final object classification as total majority after 3 evaluations Otherwise, re-plan of 4 additional evaluations. Then simple majority at 7
  67. 67. Results - Majority Evaluation_3/3 Crowdsearcher 70 Final object classification as total majority after 3 evaluations Otherwise, simple majority at 5 or at 7 (with replan)
  68. 68. Results – Spammer Detection_1/2 Crowdsearcher 71 New rule for spammer detection without ground truth Performer correctness on final majority. Spammer if > 50% wrong classifications
  69. 69. Results – Spammer Detection_1/2 Crowdsearcher 72 New rule for spammer detection without ground truth Performer correctness on current majority. Spammer if > 50% wrong classifications
  70. 70. EXPERT FINDING IN CROWDSEARCHER Crowdsearcher 73
  71. 71. Problem • Ranking the members of a social group according to the level of knowledge that they have about a given topic • Application: crowd selection (for Crowd Searching or Sourcing) • Available data • User profile • behavioral trace that users leave behind them through their social activities Crowdsearcher 74
  72. 72. Considered Features • User Profiles • Plus Linked Web Pages • Social Relationships • Facebook Friendship • Twitter mutual following relationship • LinkedIn Connections • Resource Containers • Groups, Facebook Pages • Linked Pages • Users who are followed by a given user are resource containers • Resources • Material published in resource containers Crowdsearcher 75
  73. 73. Feature Organization Meta-Model Crowdsearcher 76
  74. 74. Example (Facebook) Crowdsearcher 77
  75. 75. Example (Twitter) Crowdsearcher 78
  76. 76. Resource Distance • Objects in social graph organized according to their distance with respect to the user profile • Why? Privacy, Computational Cost, Platform Access Constraints Distance Resource 0 Expert Candidate Profile 1 Expert Candidate owns/create/annotates Resource Expert Candidate relatedTo Resource Container Expert Candidate follows UserProfile 2 Expert Candidate follows UserProfile relatedTo Resource Container Expert Candidate relatedTo Resource Container contains Resource Expert Candidate follows UserProfile owns/create/annotates Resource Expert Candidate follows UserProfile follows UserProfile Crowdsearcher 79
  77. 77. Distance interpretation Distance Resource 0 Expert Candidate Profile 1 Expert Candidate owns/create/annotates Resource Expert Candidate relatedTo Resource Container Expert Candidate follows UserProfile 2 Expert Candidate follows UserProfile relatedTo Resource Container Expert Candidate relatedTo Resource Container contains Resource Expert Candidate follows UserProfile owns/create/annotates Resource Expert Candidate follows UserProfile follows UserProfile Crowdsearcher 80
  78. 78. Resource Processing • Extraction from Social Network APIs • Extraction of Text from linked Web Pages • Alchemy Text Extraction APIs • Language Identification • Text Processing • Sanitization, tokenization, stopword, lemmatization • Entity Extraction and Disambiguation • TagMe Crowdsearcher 81
  79. 79. Dataset • 7 kinds of expertises • Computer Engineering, Location, Movies & TV, Music, Science, Sport, Technology & Videogames • 40 volunteer users (on Facebook & Twitter & LinkedIN) • 330.000 resources (70% with URL to external resources) • Groundtruth created trough self-assessment • For expertise need, vote on 7 Likert Scale • EXPERTS  expertise above average Crowdsearcher 84
  80. 80. Metrics • We obtain lists of candidate experts and assess them against the ground truth, using: • For precision: • Mean Average Precision (MAP) • 11-Point Interpolated Average Precision (11-P) • For ranking: • Mean Reciprocal Rank (MRR) – for the first value • Normalized Discounted Cumulative Gain (DCG) – for more values, can be set @N for the first N values Crowdsearcher 86
  81. 81. Metrics improves with resources • But it comes with a cost Crowdsearcher 87
  82. 82. Friendship Relationship not useful • Inspecting friend’s resources does not improve metrics! Crowdsearcher 88
  83. 83. Social Network Analysis • a Comparison of the results obtained with All the social networks, or separately by FaceBook, TWitter, and LinkedIn. Crowdsearcher 89
  84. 84. Main Results • Profiles are less effective than level-1 resources • Resources produced by others help in describing each individual’s expertise • Twitter is the most effective social network for expertise matching – sometimes it outperforms the other social networks • Twitter most effective in Computer Engineering, Science, Technology & Games, Sport • Facebook effective in Locations, Sport, Movies & TV, Music • Linked-in never very helpful in locating expertise Crowdsearcher 90
  85. 85. CONCLUSIONS Crowdsearcher 95
  86. 86. Summary • Results • An integrated framework for crowdsourcing task design and control • Well-structured control rules with guarantees of termination • Support for cross-platform crowd interoperability • A working prototype  crowdsearcher.search-computing.org • Forthcoming • Publication of Web Interface + API • Support of declarative options for automatic rule generation • Integration with more social networks and human computation platforms • Providing vertical solutions for specific markets • More applications and experiments (e.g. in Expo 2015) Crowdsearcher 96
  87. 87. QUESTIONS? Crowdsearcher 97

×