• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
seekda's Web Service search engine
 

seekda's Web Service search engine

on

  • 1,261 views

Presentation at the Semantic Web meetup in Seattle, WA, USA, in March 2012: http://www.meetup.com/Semantically-Webbed-Seattle-Meetup-Group/events/52635992/

Presentation at the Semantic Web meetup in Seattle, WA, USA, in March 2012: http://www.meetup.com/Semantically-Webbed-Seattle-Meetup-Group/events/52635992/

Statistics

Views

Total Views
1,261
Views on SlideShare
1,261
Embed Views
0

Actions

Likes
0
Downloads
21
Comments
4

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

14 of 4 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    seekda's Web Service search engine seekda's Web Service search engine Presentation Transcript

    • seekda‘s Web Service Search Engine Nathalie Steinmetz seekda GmbH 1© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • seekda Web Service Search Engine 2© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Motivation  “Web of services”  Growing amount of public services & data on the Web  Problem: How do I find the service I need?  General search engine: services hard to identify, not much information on results page  Specific portals: access to restricted sets of registered and editorially maintained services  Use semantic technologies for better search experience  No to heavy-weight, expressive semantic web service languages such as OWL-S or WSML  Yes to simple light-weight semantic annotations in RDF   Scalability! 3© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Outline  Web Service search engine - basics  Focused Crawling  WSDL-based services  Web APIs  Seekda‘s search engine & experimental prototype  Crowdsourcing Web Service annotations  Web Service Annotation wizard  Amazon Mechanical Turk crowdsourcing  Service ontologies© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Location  Locating Web Services on the Web (Approach adopted by European projects Service-Finder & SOA4All)  Crawling the Web for services  Aggregate information  Annotate services  Supported services:  WSDL descriptions  Web APIs (a.k.a. RESTful services) 5© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Crawler Architecture Crawl Operator Collecting Seeds Configuration & Monitoring Crawling RDF meta-data Data Post-Processing ARCs Index 6© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Crawling the Web for Services  Basic crawling process:  Start with a set of seed URLs  Check whether a page should be fetched or not  Fetch the document the URL points to  Extract links from the fetched document  Decide whether or not to store fetched documents  Feed crawler queues with newly extracted links  Assign costs/priorities to single URLs and queues 7© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Focused Crawling Techniques  Seed Collection  Collecting seeds from specialized portals  Reuse known Web Service descriptions and related documents  URL Scheduling  Use clever means to prioritize URLs to focus the crawls to the relevant part of the Web  Assign costs that influence the priority of a URL in a queue  Based on:  Building term vectors of pages to assess similarity to WS domain  URL characteristics  Queue Scheduling  One queue per host  Prioritize queues with low-cost URLs 8© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Identify WSDLs and Related Information  WSDL identification  Check whether a fetched page is XML and valid WSDL  Related documents identification  Definition of related document  Inlink to the WSDL  Outlink from the WSDL  Associated by term vector similarity  Task split between crawl run-time and post-processing of the crawl data  Task implies the deeper crawling of service provider domains 9© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Unique Service Objects  Building unique service objects  Collect all similar WSDLs  deduplication  One service = all WSDLs with same provider and service  Example:  Unique Service: http://seekda.com/providers/cdyne.com/IP2Geo  Endpoint: http://ws.cdyne.com/ip2geo/ip2geo.asmx  Provider: cdyne.com  Service: IP2Geo  WSDLs: http://ws.cdyne.com/ip2geo/ip2geo.asmx?wsdl http://miki2005.uda.ad/p1net/Web%20References/com.cdyne.ws/ip2geo.wsdl ...  Create uniqe service identifiers:  http://seekda.com/providers/<providerName>/<serviceName>  Assemble related information 10© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Search Results 11© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Overview 12© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • seekda Web Service Search Engine 13© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Why crawl for Web APIs?  Significant growth of Web APIs  > 5,400 Web APIs on ProgrammableWeb (including SOAP and REST APIs) [end of 2009: ca. 1,500 Web APIs]  > 6,500 Mashups on ProgrammableWeb (combining Web APIs from one or more sources)  SOAP services are only a small part of the overall available public services 14© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Web API – Example (1/3) 15© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Web API – Example (2/3) 16© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Web API – Example (3/3)  Problem:  Web APIs are described by regular HTML pages  No standardized structure that helps with the identification 17© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Web API Identification  Solution: Crawl for Web APIs  Approach 1: Manual Feature Identification Approach  Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)  Approach 2: Automatic Classification Approach  Text Classification, supervised learning (Support Vector Machine model)  Training set: APIs from ProgrammableWeb 18© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Unique Service Objects – Web APIs  Create unique identifiers:  Again using the provider name (from the Web API homepage)  We do not know the service name  hash value of URL instead  http://seekda.com/providers/<providerName>/<hashValueOfURL >  But: still needed human confirmation to be sure 19© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • New Search Engine Prototype 20© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Prototype – User Contributions  Web API – yes/no: confirmation from human needed!  Other annotations that help improve the search for Web Services  Categories  Tags  Natural Language descriptions  Cost: Free or paid service 21© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Problem - User Contribution  Problem:  Users/developers don’t contribute enough  Hard to motivate them to provide annotations  Community recognition or peer respect not enough  Solution: crowdsourcing the annotations, pay people to provide annotations  Use Amazon Mechanical Turk  Bootstrap annotations quickly and cheap 22© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (1/4) 23© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (2/4) 24© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (3/4) 25© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Annotation Wizard (4/4) 26© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iteration 1 Number of Submissions 70 Reward per task $0.10 Restrictions none  Annotation Wizard  Web API Yes/No  Assign a category  Assign tags  Provide a natural language description  Determine whether page is documentation, pricing or listing  Rate the service 27© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iteration 1  Results  21 APIs correctly identified as APIs  28 Web documents (non APIs) identified correctly as non APIs  49/70 correctly identified (70% accuracy)  Average task completion time: 2:20 min  But, only:  4 well done & complete annotations  8 acceptable annotations (non complete) 28© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iterations 2 & 3 Iteration 2 Iteration 3 Number of Submissions 100 150 Reward per task $0.20 $0.20 Restrictions yes yes  Annotation Wizard  Removed page type identification & service rating  For a task to be accepted:  At least one category must be assigned  At least 2 tags must be provided  A meaningful description must be provided 29© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Iteration 2 & 3  Results Iteration 2 & 3:  Ca. 80% of documents correctly identified  Very satisfying annotations  Average completion time: 2:36 min 30© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk – Survey  48 survey submissions  Female 18, Male 30  Most popular origins: India (27) and USA (9)  Popular age groups:  15-22 (12)  23-30 (18)  31-50 (16)  Most of them worked in some IT profession  Provided best quality annotations 31© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Amazon Mechanical Turk  Recommendations for further improvement:  Improve task description, especially ‘what is a Web API’  Better examples (e.g., hinting what makes a false page false)  Allow assignment of multiple categories  Restrict to workers in IT professions?  Conclusion:  Very positive results  good way to get quality annotations  Results will help provide better search experience to users  Results can be used as positive set for automatic classification 32© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Ontologies (1/2) 33© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Service Ontologies (2/2) http://www.service-finder.eu/ontologies/ServiceCategories 34© Copyright 2012 SEEKDA GmbH – www.seekda.com
    • Questions? 35© Copyright 2012 SEEKDA GmbH – www.seekda.com