Crowd Sourcing Web Service Annotations
Upcoming SlideShare
Loading in...5
×
 

Crowd Sourcing Web Service Annotations

on

  • 234 views

Presentation at the AAAI 2012 Spring Symposium: Intelligent Web Services Meet Social Computing, Palo Alto, CA, United States. Paper: Crowd Sourcing Web Service Annotations. Authors: James Scicluna, ...

Presentation at the AAAI 2012 Spring Symposium: Intelligent Web Services Meet Social Computing, Palo Alto, CA, United States. Paper: Crowd Sourcing Web Service Annotations. Authors: James Scicluna, Christoph Blank and Nathalie Steinmetz.

Statistics

Views

Total Views
234
Views on SlideShare
234
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Crowd Sourcing Web Service Annotations Crowd Sourcing Web Service Annotations Presentation Transcript

    • Crowd Sourcing Web Service Annotations James Scicluna1, Christoph Blank1, Nathalie Steinmetz1 and Elena Simperl2 1seekda GmbH, 2Karlsruhe Institute of Technology 1© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Outline  Introduction to seekda Web Service search engine  Web API crawling & identification  Amazon Mechanical Turk crowdsourcing  Web Service Annotation wizard© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • seekda Web Service Search Engine 3© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • seekda Web Service Search Engine 4© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Why crawl for Web APIs?  Significant growth of Web APIs  > 5,400 Web APIs on ProgrammableWeb (including SOAP and REST APIs) [end of 2009: ca. 1,500 Web APIs]  > 6,500 Mashups on ProgrammableWeb (combining Web APIs from one or more sources)  SOAP services are only a small part of the overall available public services 5© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Web API Crawling  Problem:  Web APIs are described by regular HTML pages  No standardized structure that helps with the identification 6© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Web API Identification  Solution: Crawl for Web APIs  Approach 1: Manual Feature Identification Approach  Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)  Approach 2: Automatic Classification Approach  Text Classification, supervised learning (Support Vector Machine model)  Training set: APIs from ProgrammableWeb  But: still needed human confirmation to be sure 7© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • New Search Engine Prototype 8© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Prototype – User Contributions  Web API – yes/no: confirmation from human needed!  Other annotations that help improve the search for Web Services  Categories  Tags  Natural Language descriptions  Cost: Free or paid service 9© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Problem - User Contribution  Problem:  Users/developers don’t contribute enough  Hard to motivate them to provide annotations  Community recognition or peer respect not enough  Solution: crowdsourcing the annotations, pay people to provide annotations  Use Amazon Mechanical Turk  Bootstrap annotations quickly and cheap 10© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (1/4) 11© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (2/4) 12© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (3/4) 13© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (4/4) 14© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iteration 1 Number of Submissions 70 Reward per task $0.10 Restrictions none  Annotation Wizard  Web API Yes/No  Assign a category  Assign tags  Provide a natural language description  Determine whether page is documentation, pricing or listing  Rate the service 15© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iteration 1  Results  21 APIs correctly identified as APIs  28 Web documents (non APIs) identified correctly as non APIs  49/70 correctly identified (70% accuracy)  Average task completion time: 2:20 min  But, only:  4 well done & complete annotations  8 acceptable annotations (non complete) 16© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iterations 2 & 3 Iteration 2 Iteration 3 Number of Submissions 100 150 Reward per task $0.20 $0.20 Restrictions yes yes  Annotation Wizard  Removed page type identification & service rating  For a task to be accepted:  At least one category must be assigned  At least 2 tags must be provided  A meaningful description must be provided 17© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iteration 2 & 3  Results Iteration 2 & 3:  Ca. 80% of documents correctly identified  Very satisfying annotations  Average completion time: 2:36 min 18© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Survey  48 survey submissions  Female 18, Male 30  Most popular origins: India (27) and USA (9)  Popular age groups:  15-22 (12)  23-30 (18)  31-50 (16)  Most of them worked in some IT profession  Provided best quality annotations 19© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk  Recommendations for further improvement:  Improve task description, especially ‘what is a Web API’  Better examples (e.g., hinting what makes a false page false)  Allow assignment of multiple categories  Conclusion:  Very positive results  good way to get quality annotations  Results will help provide better search experience to users  Results can be used as positive set for automatic classification 20© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Questions? 21© Copyright 2012 SEEKDA GmbH – www.seekda.com