Cross-Lingual Web API  Classification and Annotation Maria Maleshkova, Lukas Zilka, Petr Knoth, Carlos Pedrinaci 2nd Workshop on the Multilingual Semantic Web ISWC 201
Services on the Web “ Classical “ Web services (WSDL, SOAP, WS-*) Enable the publishing and consuming of functionalities of existing applications Web APIs Enable the access to collections of resources + =
Why Web APIs?
Challenges with Web APIs There is not a widely used IDL Only textual documentation (HTML) Finding  services is hard  Their use requires human interpretation of semi-structured descriptions The semantics of the services are not described in a machine-processable manner Prevents automating discovery, invocation, and composition of Web APIs
Lightweight Semantics  hRESTS and MicroWSMO "There's usually an HTML page" Identifying machine-readable parts Service, its operations Resource address, HTTP method Input/output data format    hRESTS  microformat MicroWSMO  extends hRESTS model for model references lifting, lowering
Annotation Example
Annotation Example Service Operation Input Parameter
Geocoding API http://www.geonames.org/export/web-services.html
Geocoding API http://ondras.zarovi.cz/smap/geokodovani/
Explicit Semantic Analysis (ESA) EAS – a method for assessing semantic relatedness of two texts It represents text as a bag-of-concepts I uses a background collection cross-lingual semantic comparison possible Adopting ESA for API Classification and Annotation: ProgrammableWeb as background collection  Both for API classification and determining relevant annotations Text preprocessing service-specific stop-word removal Applying Explicit Semantic Analysis Obtain categories and central concepts
Supporting Cross-Lingual Classification Determine the language of the description Remove stop words and project description in language-specific concepts space of Wikipedia  Project the resulting vector in the English Wikipedia concept space Determine the most similar vector from the background collection Determine category based on similarity scores
Cross-Lingual API Annotation Determine the language of the description Remove stop words and project description in language-specific concepts space of Wikipedia  Project the resulting vector in the English Wikipedia concept space Determine the most similar vector from the background collection Determine central concepts based on most similarity API - Manually assigned - Automatically derived
Implementation SWEET
Approach Implementation SWEET’s architecture
Future Work Harvest the  “surrounding" web to help the classier Pattern based extraction of useful information Ensure quality of the background collection
Conclusion Finding and using Web APIs is a challenging task Need for a cross-lingual support We provide Solution: adoption of the cross-lingual EAS approach Implementations and tool realization    Universal cross-lingual approach enabling us to support almost any human language
Thank you!

Cross-Lingual Web API Classification

  • 1.
    Cross-Lingual Web API Classification and Annotation Maria Maleshkova, Lukas Zilka, Petr Knoth, Carlos Pedrinaci 2nd Workshop on the Multilingual Semantic Web ISWC 201
  • 2.
    Services on theWeb “ Classical “ Web services (WSDL, SOAP, WS-*) Enable the publishing and consuming of functionalities of existing applications Web APIs Enable the access to collections of resources + =
  • 3.
  • 4.
    Challenges with WebAPIs There is not a widely used IDL Only textual documentation (HTML) Finding services is hard Their use requires human interpretation of semi-structured descriptions The semantics of the services are not described in a machine-processable manner Prevents automating discovery, invocation, and composition of Web APIs
  • 5.
    Lightweight Semantics hRESTS and MicroWSMO "There's usually an HTML page" Identifying machine-readable parts Service, its operations Resource address, HTTP method Input/output data format  hRESTS microformat MicroWSMO extends hRESTS model for model references lifting, lowering
  • 6.
  • 7.
    Annotation Example ServiceOperation Input Parameter
  • 8.
  • 9.
  • 10.
    Explicit Semantic Analysis(ESA) EAS – a method for assessing semantic relatedness of two texts It represents text as a bag-of-concepts I uses a background collection cross-lingual semantic comparison possible Adopting ESA for API Classification and Annotation: ProgrammableWeb as background collection Both for API classification and determining relevant annotations Text preprocessing service-specific stop-word removal Applying Explicit Semantic Analysis Obtain categories and central concepts
  • 11.
    Supporting Cross-Lingual ClassificationDetermine the language of the description Remove stop words and project description in language-specific concepts space of Wikipedia Project the resulting vector in the English Wikipedia concept space Determine the most similar vector from the background collection Determine category based on similarity scores
  • 12.
    Cross-Lingual API AnnotationDetermine the language of the description Remove stop words and project description in language-specific concepts space of Wikipedia Project the resulting vector in the English Wikipedia concept space Determine the most similar vector from the background collection Determine central concepts based on most similarity API - Manually assigned - Automatically derived
  • 13.
  • 14.
  • 15.
    Future Work Harvestthe “surrounding" web to help the classier Pattern based extraction of useful information Ensure quality of the background collection
  • 16.
    Conclusion Finding andusing Web APIs is a challenging task Need for a cross-lingual support We provide Solution: adoption of the cross-lingual EAS approach Implementations and tool realization  Universal cross-lingual approach enabling us to support almost any human language
  • 17.