Harnessing The Crowds For Automating    The Identification Of Web APIs Carlos Pedrinaci, Chenghua Lin, Dong Liu, John Domi...
Web APIs are the   Publicly offering valuable data and                   functionalitynew WEB services   Widely used and r...
Web APIs and RESTful      Services• Services based on a simple(r) stack of  technologies than WS-* • Roughly URL + HTTP + ...
How to Discover  Web APIs?
Po  or     Res        ul          ts
OK     Re       su          lts
Po  or     Res        ul          ts
Ou  to    fd      ate
Issues for Discovering Web APIs• There is no simple way to effectively  and uniquely identify Web APIs • No standardised d...
How can weautomatically find    Web APIs?
Hypothesis• Every Web API provides a/several  public documentation page(s)• These pages provide the most relevant  informa...
Web API          Given a Web page determine if it                 documents an API or notIdentification   Sometimes a hard...
Collecting            Harnessing the crowds for detecting                      documentation pagesdocumentation Pages
Generating a      Often the links are obsolete or point to                  general pagescurated dataset
Dataset Generated• We used API Validator to process 1,872  APIs from ProgrammableWeb • 43% of the URLs we started with   (...
Web API identification       Engine• Web API identification as a binary  classification problem• Extract core features from ...
Preliminary Experiment• Used initially only Web page words as a  feature• Trained two classifiers NB and SVM• Used a simple...
Evaluation Results Model    Precision   Recall   F1     AccuracyKeyword     60.3       75.7    67.0     70.2  NB        71...
Evaluation Results• Although preliminary the approach  already provides promising results• Both NB and SVM provide a good ...
Conclusions and Future        Work• Discovering Web APIs is becoming  increasingly important and existing  support is not ...
Conclusions and      Future Work• Further features are been included for  improving the results • Title, URL, presence of ...
Conclusions and      Future Work• A larger training set is necessary • Need more validated pages (help!) • http://iserve-d...
Thanks for your   attention
Harnessing the Crowds for Automating the Identification of Web APIs
Harnessing the Crowds for Automating the Identification of Web APIs
Harnessing the Crowds for Automating the Identification of Web APIs
Upcoming SlideShare
Loading in...5
×

Harnessing the Crowds for Automating the Identification of Web APIs

445

Published on

Supporting the efficient discovery and use of Web APIs is increasingly important as their use and popularity grows. Yet, a simple task like finding potentially inter- esting APIs and their related documentation turns out to be hard and time consuming even when using the best resources currently available on the Web. We describe our research towards an automated Web API documentation crawler and search engine. We have devised and exploited crowdsourcing techniques to generate a curated dataset of Web APIs documentation. Thanks to this dataset, we have devised an engine able to automatically detect documentation pages. Our preliminary experiments have shown that we obtain an accuracy of 80% and a precision increase of 15 points over a keyword-based heuristic we have used as baseline.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
445
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • A Web API - the main page\n
  • A Web API - the documentation page\n
  • A Web API - the offered functionality and data\n(example of an invocation and the XML obtained)\n
  • \n
  • Google?\n
  • Prog Web\n\nIssues: \n- requires manual registration\n- gets out of date (example)\n- discovery remains at a very coarse grain (manual categorisation, or some keywords) \nNo notion of operations and resources provided, etc\n
  • Prog Web\n\nIssues: \n- requires manual registration\n- gets out of date (example)\n- discovery remains at a very coarse grain (manual categorisation, or some keywords) \nNo notion of operations and resources provided, etc\n
  • Prog Web\n\nIssues: \n- requires manual registration\n- gets out of date (example)\n- discovery remains at a very coarse grain (manual categorisation, or some keywords) \nNo notion of operations and resources provided, etc\n
  • \n
  • \n
  • Some example documentation pages\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Harnessing the Crowds for Automating the Identification of Web APIs

    1. 1. Harnessing The Crowds For Automating The Identification Of Web APIs Carlos Pedrinaci, Chenghua Lin, Dong Liu, John Domingue KMi, The Open University
    2. 2. Web APIs are the Publicly offering valuable data and functionalitynew WEB services Widely used and reused Although their use is hardly automated
    3. 3. Web APIs and RESTful Services• Services based on a simple(r) stack of technologies than WS-* • Roughly URL + HTTP + XML/JSON• Easy way to provide a programmatic interface to existing Web sites• Seldom adopt REST principles
    4. 4. How to Discover Web APIs?
    5. 5. Po or Res ul ts
    6. 6. OK Re su lts
    7. 7. Po or Res ul ts
    8. 8. Ou to fd ate
    9. 9. Issues for Discovering Web APIs• There is no simple way to effectively and uniquely identify Web APIs • No standardised document describing the interface • URLs are hardly usable for this end
    10. 10. How can weautomatically find Web APIs?
    11. 11. Hypothesis• Every Web API provides a/several public documentation page(s)• These pages provide the most relevant information for developers‣ Web API location can be approached as a documentation discovery problem
    12. 12. Web API Given a Web page determine if it documents an API or notIdentification Sometimes a hard problem even for humans
    13. 13. Collecting Harnessing the crowds for detecting documentation pagesdocumentation Pages
    14. 14. Generating a Often the links are obsolete or point to general pagescurated dataset
    15. 15. Dataset Generated• We used API Validator to process 1,872 APIs from ProgrammableWeb • 43% of the URLs we started with (data from 2010) • 624 a documentation page • 929 not a documentation page • 318 skipped (server down or unclear)
    16. 16. Web API identification Engine• Web API identification as a binary classification problem• Extract core features from Web pages• Use machine learning algorithms to provide an identification engine
    17. 17. Preliminary Experiment• Used initially only Web page words as a feature• Trained two classifiers NB and SVM• Used a simple keyword-based heuristic as baseline for comparison (the occurrence of 3 or more keywords) • api, input, output, GET, PUT, etc
    18. 18. Evaluation Results Model Precision Recall F1 AccuracyKeyword 60.3 75.7 67.0 70.2 NB 71.0 79.2 74.8 78.6 SVM 75.4 70.8 73.1 79.0
    19. 19. Evaluation Results• Although preliminary the approach already provides promising results• Both NB and SVM provide a good accuracy (about 80%)• Best Precision (75.4%) achieved by SVM which is 15 points better than the baseline
    20. 20. Conclusions and Future Work• Discovering Web APIs is becoming increasingly important and existing support is not optimal• Web APIs identification is a first step that can well be approached as a documentation identification problem• Crowds input (ProgWeb and API Validator) has been essential
    21. 21. Conclusions and Future Work• Further features are been included for improving the results • Title, URL, presence of camelCase words • Current tests have reached an accuracy of 82% using SGD
    22. 22. Conclusions and Future Work• A larger training set is necessary • Need more validated pages (help!) • http://iserve-dev.kmi.open.ac.uk/validator/• A larger experiment will be carried over a normal Web crawl
    23. 23. Thanks for your attention
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×