• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
409
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
16
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CROWDSEARCHER Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio Politecnico di Milano Dipartimento di Elettronica, Informazione e BioIngegneria Crowdsearcher 1
  • 2. Crowd-based Applications • Emerging crowd-based applications: • opinion mining • localized information gathering • marketing campaigns • expert response gathering • General structure: • the requestor poses some questions • a wide set of responders are in charge of providing answers (typically unknown to the requestor) • the system organizes a response collection campaign • Include crowdsourcing and crowdsearching Crowdsearcher 2
  • 3. The “system” is a wide concept • Crowd-based applications may use social networks and Q&A websites in addition to crowdsourcing platforms • Our approach: a coordination engine which keeps an overall control on the application deployment and execution Crowdsearcher 3 CrowdSearcher APIAccess
  • 4. CrowdSearcher • Combines a conceptual framework, a specification paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring applications on top of crowd-based systems • Design is top-down, platform-independent • Deployment turns declarative specifications into platform-specific implementations which include social networks and crowdsourcing platforms • Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability • Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013) Crowdsearcher 4
  • 5. An example of crowd-based application: crowd-search • People do not trust web search completely • Want to get direct feedback from people • Expect recommendations, insights, opinions, reassurance Crowdsearcher 7
  • 6. Crowd-searching after conventional search • From search results to friends and experts feedback Social Platform initial query Human Search System Search System Social PlatformSocial Platform Crowdsearcher 8
  • 7. Example: Find your next job (exploration) Crowdsearcher 9
  • 8. Example: Find your job (social invitation) Crowdsearcher 10
  • 9. Example: Find your job (social invitation) Selected data items can be transferred to the crowd question Crowdsearcher 11
  • 10. Find your job (response submission) Crowdsearcher 12
  • 11. Crowdsearcher results (in the loop) Crowdsearcher 13
  • 12. Deployment alternatives • Multi-platform deployment Embedded application Social/ Crowd platform Native behaviours External application Standalone application API Embedding Community / Crowd Generated query template Native Crowdsearcher 14
  • 13. Deployment: search on a social network • Multi-platform deployment Crowdsearcher 15
  • 14. Deployment: search on the social network • Multi-platform deployment Crowdsearcher 16
  • 15. Deployment: search on the social network • Multi-platform deployment Crowdsearcher 17
  • 16. Deployment: search on the social network • Multi-platform deployment Crowdsearcher 18
  • 17. From social workers to communities • Issues and problems • Motivation of the responders • Intensity of social activity of the asker • Topic appropriateness • Timing of the post (hour of the day, day of the week) • Context and language barrier Crowdsearcher 19
  • 18. THE MODELAND THE PROCESS Crowdsearcher 20
  • 19. • A simple task design and deployment process, based on specific data structures • created using model-driven transformations • driven by the task specification The Design Process Task Specification Task Planning Control Specification Crowdsearcher 21 • Task Specification: task operations, objects, and performers • Task Planning: work distribution • Control Specification: task control policies
  • 20. Task Specification • Which are the input objects of the crowd interaction? • Do they have a schema (record of named and typed fields)? • Which operations should the crowd perform? • Like, label, comment, add new instances, verify/modify data, order, etc. • Who are the performers of the task? How should they be selected? And invited? • e.g. push vs pull model • Which quality criteria should be used for deciding the task outcome? • e.g., majority weighting, with/without spam detection • Which platforms should be used? Which execution interface should be used? Crowdsearcher 22
  • 21. Operations • In a Task, performers are required to execute logical operations on input objects • e.g. Locate the faces of the people appearing in the following 5 images • CrowdSearcher offers pre-defined operation types: • Like: Ask a performer to express a preference (true/false) • e.g. Do you like this picture? • Comment: Ask a performer to write a description / summary / evaluation • e.g. Can you summarize the following text using your own words? • Tag: Ask a performer to annotate an object with a set of tags • e.g. How would you label the following image? • Classify: Ask a performer to classify an object within a closed-set of alternatives • e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema • e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano? • Modify: Ask a performer to verify/modify the content of one or more input object • e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects • e.g. Order the following books according to your taste Crowdsearcher 23
  • 22. Task planning Typical problems: • Task structuring: the task is too complex or too critical to be executed as a single operation. • Task splitting: the input data collection is too large to be presented to a user. • Task routing: a query can be distributed according to the values of some attribute of the collection. Crowdsearcher 24
  • 23. Micro Tasks • The actual unit of interaction with a performer. • Mapping of objects to Micro Tasks: • How many objects in each MicroTask? • Which objects should appear in each MicroTask? • How often an object should appear in MicroTasks? • Which objects cannot appear together? • Should objects be presented always in some order? Crowdsearcher 25
  • 24. Assignment Strategy • Given a set of MicroTasks, which performers are assigned to them? • Pull vs Push: • Pull: The performer choses • Push: The performer is chosen • Online vs offline • Micro Tasks dynamically assigned to performers • First come / First served • Based on performer’s performance • MicroTasks statically assigned to performers • Based on performers’ priority • Based on matching Crowdsearcher 26
  • 25. Invitation Strategy • The process of inviting performers to perform Micro Tasks • Can use very different mechanisms • Essential in order to generate the appropriate performer reaction / reward. • Examples: • Send an email to a mailing list • Publish a HIT on Mechanical Turk • Create a new challenge in your game • Publish a post/tweet on your social network profile • Publish a post/tweet on your friends' profile Crowdsearcher 27
  • 26. Steps in Crodw-based Application Design 1) Task Design 2) Object and Performer Design 3) Micro Task Design
  • 27. Step 1. Task Design Crowdsearcher 29
  • 28. Step 2: Object and Performer Design
  • 29. Step 3: MicroTask Design Crowdsearcher 31
  • 30. Complete Meta-Model Crowdsearcher 32
  • 31. Design Tool: Screenshot Crowdsearcher 33
  • 32. Application instatiation (for Italian Politics) • Given the picture and name of a politician, specify his/her political affiliation • No time limit • Performers are encouraged to look up online • 2 set of rules • Majority Evaluation • Spammer Detection Crowdsearcher 34
  • 33. REACTIVITY AND MULTIPLATFORM Crowdsearcher 35
  • 34. Crowd Control is tough… • There are several aspects that makes crowd engineering complicated • Task design, planning, assignment • Workers discovery, assessment, engagement Crowdsearcher 36
  • 35. Crowd Control is tough… • There are several aspects that makes crowd engineering complicated • Task design, planning, assignment • Workers discovery, assessment, engagement • Controlling crowdsourcing tasks is a fundamental issue • Cost • Time • Quality • Need for higher level abstrasction and tools Crowdsearcher 37
  • 36. Reactive Crowdsourcing • A conceptual framework for controlling the execution of crowd-based computations. Based on: • Control Marts • Active Rules • Classical forms of controls: • Majority control (to close object computations) • Quality control (to check that quality constraints are met) • Spam detection (to detect / eliminate some performers) • Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers) Crowdsearcher 38
  • 37. Why Active Rules? • Ease of Use: control is easily expressible • Simple formalism, simple computation • Power: arbitrarily complex controls is supported • Extensibility mechanisms • Automation: active rules can be system-generated • Well-defined semantics • Flexibility: localized impact of changes on the rules set • Control isolation • Known formal properties descending from known theory • Termination, confluence Crowdsearcher 39
  • 38. Control Mart • Data structure for controlling application execution, inspired by data marts (for data warehousing); content is automatically built from task specification & planning • Central entity: MicroTask Object Execution • Dimensions: Task / Operations, Performer, Object Crowdsearcher 40 Task Specification Task Planning Control Specification
  • 39. Auxiliary Structures • Object : tracking object responses • Performer: tracking performer behavior (e.g. spammers) • Task: tracking task status Crowdsearcher 41 Task Specification Task Planning Control Specification
  • 40. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm Crowdsearcher 42
  • 41. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD  before state of a row • NEW  after state of a row Crowdsearcher 43 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
  • 42. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD  before state of a row • NEW  after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) Crowdsearcher 44 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’
  • 43. Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD  before state of a row • NEW  after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) • Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan) Crowdsearcher 45 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1
  • 44. e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1 Crowdsearcher 46 Rule Example 1
  • 45. e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1 Crowdsearcher 47 Rule Example 1
  • 46. e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty == ’Republican’ a: SET ObjectControl[oID == NEW.oID].#Eval+= 1 Crowdsearcher 48 Rule Example 1
  • 47. Crowdsearcher 49 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  • 48. Crowdsearcher 50 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  • 49. Crowdsearcher 51 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  • 50. Crowdsearcher 52 e: UPDATE FOR ObjectControl c: (NEW.Rep== 2) or (NEW.Dem == 2) a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer, SET TaskControl[tID==NEW.tID].compObj += 1 Rule Example 2
  • 51. Rule Programming Best Practice • We define three classes of rules Crowdsearcher 53
  • 52. Rule Programming Best Practice Crowdsearcher 54 • We define three classes of rules • Control rules: modifying the control tables;
  • 53. Rule Programming Best Practice Crowdsearcher 55 • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
  • 54. Rule Programming Best Practice Crowdsearcher 56 • Top-to-bottom, left-to-right, evaluation • Guaranteed termination • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
  • 55. Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning Crowdsearcher 57 • Termination must be proven (rule precedence graph has cycles)
  • 56. EXPERIMENTS Crowdsearcher 58
  • 57. Crowdsearcher Experiment 1 • Goal: Test engagement on social networks • Some 150 users • Two classes of experiments: • Random questions on fixed topics: interests (e.g. restaurants in the vicinity of Politecnico), to famous 2011 songs, or to top-quality EU soccer teams • Questions manually submitted by the users • Different invitation strategies: • Random invitation • Explicit selection of responders by the asker • Outcome • 175 like and insert queries • 1536 invitations to friends • 230 answers • 95 questions (~55%) got at least one answer Crowdsearcher 59
  • 58. Manual and Random Questions Crowdsearcher 60
  • 59. Interest / Rewarding Factor • Manually written and assigned questions are consistently more responded in time Crowdsearcher 61
  • 60. Query Type • Engagement depends on the difficulty of the task • Like vs. Add tasks: Crowdsearcher 62
  • 61. Comparison of Execution Platforms • Facebook vs. Doodle Crowdsearcher 64
  • 62. Posting Time • Facebook vs. Doodle Crowdsearcher 65
  • 63. Crowdsearcher Experiment 2 • GOAL: demonstrate the flexibility and expressive power of reactive crowdsourcing • 3 experiments, focused on Italian politicians • Parties: Human Computation  affiliation classification • Law: Game With a Purpose  guess the convicted politician • Order: Pure Game  hot or not • 1 week (November 2012) • 284 distinct performers • Recruited through public mailing lists and social networks announcements • 3500 Micro Tasks Crowdsearcher 66
  • 64. Politician Affiliation • Given the picture and name of a politician, specify his/her political affiliation • No time limit • Performers are encouraged to look up online • 2 set of rules • Majority Evaluation • Spammer Detection Crowdsearcher 67
  • 65. Results – Majority Evaluation_1/3 Crowdsearcher 68 30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations
  • 66. Results - Majority Evaluation_2/3 Crowdsearcher 69 Final object classification as total majority after 3 evaluations Otherwise, re-plan of 4 additional evaluations. Then simple majority at 7
  • 67. Results - Majority Evaluation_3/3 Crowdsearcher 70 Final object classification as total majority after 3 evaluations Otherwise, simple majority at 5 or at 7 (with replan)
  • 68. Results – Spammer Detection_1/2 Crowdsearcher 71 New rule for spammer detection without ground truth Performer correctness on final majority. Spammer if > 50% wrong classifications
  • 69. Results – Spammer Detection_1/2 Crowdsearcher 72 New rule for spammer detection without ground truth Performer correctness on current majority. Spammer if > 50% wrong classifications
  • 70. EXPERT FINDING IN CROWDSEARCHER Crowdsearcher 73
  • 71. Problem • Ranking the members of a social group according to the level of knowledge that they have about a given topic • Application: crowd selection (for Crowd Searching or Sourcing) • Available data • User profile • behavioral trace that users leave behind them through their social activities Crowdsearcher 74
  • 72. Considered Features • User Profiles • Plus Linked Web Pages • Social Relationships • Facebook Friendship • Twitter mutual following relationship • LinkedIn Connections • Resource Containers • Groups, Facebook Pages • Linked Pages • Users who are followed by a given user are resource containers • Resources • Material published in resource containers Crowdsearcher 75
  • 73. Feature Organization Meta-Model Crowdsearcher 76
  • 74. Example (Facebook) Crowdsearcher 77
  • 75. Example (Twitter) Crowdsearcher 78
  • 76. Resource Distance • Objects in social graph organized according to their distance with respect to the user profile • Why? Privacy, Computational Cost, Platform Access Constraints Distance Resource 0 Expert Candidate Profile 1 Expert Candidate owns/create/annotates Resource Expert Candidate relatedTo Resource Container Expert Candidate follows UserProfile 2 Expert Candidate follows UserProfile relatedTo Resource Container Expert Candidate relatedTo Resource Container contains Resource Expert Candidate follows UserProfile owns/create/annotates Resource Expert Candidate follows UserProfile follows UserProfile Crowdsearcher 79
  • 77. Distance interpretation Distance Resource 0 Expert Candidate Profile 1 Expert Candidate owns/create/annotates Resource Expert Candidate relatedTo Resource Container Expert Candidate follows UserProfile 2 Expert Candidate follows UserProfile relatedTo Resource Container Expert Candidate relatedTo Resource Container contains Resource Expert Candidate follows UserProfile owns/create/annotates Resource Expert Candidate follows UserProfile follows UserProfile Crowdsearcher 80
  • 78. Resource Processing • Extraction from Social Network APIs • Extraction of Text from linked Web Pages • Alchemy Text Extraction APIs • Language Identification • Text Processing • Sanitization, tokenization, stopword, lemmatization • Entity Extraction and Disambiguation • TagMe Crowdsearcher 81
  • 79. Dataset • 7 kinds of expertises • Computer Engineering, Location, Movies & TV, Music, Science, Sport, Technology & Videogames • 40 volunteer users (on Facebook & Twitter & LinkedIN) • 330.000 resources (70% with URL to external resources) • Groundtruth created trough self-assessment • For expertise need, vote on 7 Likert Scale • EXPERTS  expertise above average Crowdsearcher 84
  • 80. Metrics • We obtain lists of candidate experts and assess them against the ground truth, using: • For precision: • Mean Average Precision (MAP) • 11-Point Interpolated Average Precision (11-P) • For ranking: • Mean Reciprocal Rank (MRR) – for the first value • Normalized Discounted Cumulative Gain (DCG) – for more values, can be set @N for the first N values Crowdsearcher 86
  • 81. Metrics improves with resources • But it comes with a cost Crowdsearcher 87
  • 82. Friendship Relationship not useful • Inspecting friend’s resources does not improve metrics! Crowdsearcher 88
  • 83. Social Network Analysis • a Comparison of the results obtained with All the social networks, or separately by FaceBook, TWitter, and LinkedIn. Crowdsearcher 89
  • 84. Main Results • Profiles are less effective than level-1 resources • Resources produced by others help in describing each individual’s expertise • Twitter is the most effective social network for expertise matching – sometimes it outperforms the other social networks • Twitter most effective in Computer Engineering, Science, Technology & Games, Sport • Facebook effective in Locations, Sport, Movies & TV, Music • Linked-in never very helpful in locating expertise Crowdsearcher 90
  • 85. CONCLUSIONS Crowdsearcher 95
  • 86. Summary • Results • An integrated framework for crowdsourcing task design and control • Well-structured control rules with guarantees of termination • Support for cross-platform crowd interoperability • A working prototype  crowdsearcher.search-computing.org • Forthcoming • Publication of Web Interface + API • Support of declarative options for automatic rule generation • Integration with more social networks and human computation platforms • Providing vertical solutions for specific markets • More applications and experiments (e.g. in Expo 2015) Crowdsearcher 96
  • 87. QUESTIONS? Crowdsearcher 97