AAAI2012 - Crowd Sourcing Web Service Annotations

429 views

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
429
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

AAAI2012 - Crowd Sourcing Web Service Annotations

  1. 1. Crowd Sourcing Web Service Annotations James Scicluna1, Christoph Blank1, Nathalie Steinmetz1 and Elena Simperl2 1seekda GmbH, 2Karlsruhe Institute of Technology 1© Copyright 2012 SEEKDA GmbH – www.seekda.com
  2. 2. Outline  Introduction to seekda Web Service search engine  Web API crawling & identification  Amazon Mechanical Turk crowdsourcing  Web Service Annotation wizard© Copyright 2012 SEEKDA GmbH – www.seekda.com
  3. 3. seekda Web Service Search Engine 3© Copyright 2012 SEEKDA GmbH – www.seekda.com
  4. 4. seekda Web Service Search Engine 4© Copyright 2012 SEEKDA GmbH – www.seekda.com
  5. 5. Why crawl for Web APIs?  Significant growth of Web APIs  > 5,400 Web APIs on ProgrammableWeb (including SOAP and REST APIs) [end of 2009: ca. 1,500 Web APIs]  > 6,500 Mashups on ProgrammableWeb (combining Web APIs from one or more sources)  SOAP services are only a small part of the overall available public services 5© Copyright 2012 SEEKDA GmbH – www.seekda.com
  6. 6. Web API Crawling  Problem:  Web APIs are described by regular HTML pages  No standardized structure that helps with the identification 6© Copyright 2012 SEEKDA GmbH – www.seekda.com
  7. 7. Web API Identification  Solution: Crawl for Web APIs  Approach 1: Manual Feature Identification Approach  Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)  Approach 2: Automatic Classification Approach  Text Classification, supervised learning (Support Vector Machine model)  Training set: APIs from ProgrammableWeb  But: still needed human confirmation to be sure 7© Copyright 2012 SEEKDA GmbH – www.seekda.com
  8. 8. New Search Engine Prototype 8© Copyright 2012 SEEKDA GmbH – www.seekda.com
  9. 9. Prototype – User Contributions  Web API – yes/no: confirmation from human needed!  Other annotations that help improve the search for Web Services  Categories  Tags  Natural Language descriptions  Cost: Free or paid service 9© Copyright 2012 SEEKDA GmbH – www.seekda.com
  10. 10. Problem - User Contribution  Problem:  Users/developers don’t contribute enough  Hard to motivate them to provide annotations  Community recognition or peer respect not enough  Solution: crowdsourcing the annotations, pay people to provide annotations  Use Amazon Mechanical Turk  Bootstrap annotations quickly and cheap 10© Copyright 2012 SEEKDA GmbH – www.seekda.com
  11. 11. Service Annotation Wizard (1/4) 11© Copyright 2012 SEEKDA GmbH – www.seekda.com
  12. 12. Service Annotation Wizard (2/4) 12© Copyright 2012 SEEKDA GmbH – www.seekda.com
  13. 13. Service Annotation Wizard (3/4) 13© Copyright 2012 SEEKDA GmbH – www.seekda.com
  14. 14. Service Annotation Wizard (4/4) 14© Copyright 2012 SEEKDA GmbH – www.seekda.com
  15. 15. Amazon Mechanical Turk – Iteration 1 Number of Submissions 70 Reward per task $0.10 Restrictions none  Annotation Wizard  Web API Yes/No  Assign a category  Assign tags  Provide a natural language description  Determine whether page is documentation, pricing or listing  Rate the service 15© Copyright 2012 SEEKDA GmbH – www.seekda.com
  16. 16. Amazon Mechanical Turk – Iteration 1  Results  21 APIs correctly identified as APIs  28 Web documents (non APIs) identified correctly as non APIs  49/70 correctly identified (70% accuracy)  Average task completion time: 2:20 min  But, only:  4 well done & complete annotations  8 acceptable annotations (non complete) 16© Copyright 2012 SEEKDA GmbH – www.seekda.com
  17. 17. Amazon Mechanical Turk – Iterations 2 & 3 Iteration 2 Iteration 3 Number of Submissions 100 150 Reward per task $0.20 $0.20 Restrictions yes yes  Annotation Wizard  Removed page type identification & service rating  For a task to be accepted:  At least one category must be assigned  At least 2 tags must be provided  A meaningful description must be provided 17© Copyright 2012 SEEKDA GmbH – www.seekda.com
  18. 18. Amazon Mechanical Turk – Iteration 2 & 3  Results Iteration 2 & 3:  Ca. 80% of documents correctly identified  Very satisfying annotations  Average completion time: 2:36 min 18© Copyright 2012 SEEKDA GmbH – www.seekda.com
  19. 19. Amazon Mechanical Turk – Survey  48 survey submissions  Female 18, Male 30  Most popular origins: India (27) and USA (9)  Popular age groups:  15-22 (12)  23-30 (18)  31-50 (16)  Most of them worked in some IT profession  Provided best quality annotations 19© Copyright 2012 SEEKDA GmbH – www.seekda.com
  20. 20. Amazon Mechanical Turk  Recommendations for further improvement:  Improve task description, especially ‘what is a Web API’  Better examples (e.g., hinting what makes a false page false)  Allow assignment of multiple categories  Conclusion:  Very positive results  good way to get quality annotations  Results will help provide better search experience to users  Results can be used as positive set for automatic classification 20© Copyright 2012 SEEKDA GmbH – www.seekda.com
  21. 21. Questions? 21© Copyright 2012 SEEKDA GmbH – www.seekda.com

×