Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary

537 views

Published on

smartAPIs are an approach to the incremental, machine-aided, semantic annotation of Web APIs. Starting from existing, popular standards, we will provide enhanced tools for authoring ever-richer metadata, guided by global community knowledge encapsulated in ontologies, and aided by "smart suggestions" based on mining the metadata from previous API specifications.

The project is led by Michel Dumontier (Maastricht University). This presentation was given on his behalf by Mark Wilkinson (UPM, Madrid; Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R)

Published in: Internet
  • Be the first to comment

smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary

  1. 1. 1 @micheldumontier & @markmoby The smartAPI project Mark D. Wilkinson Center for Plant Biotechnology and Genomics UPM-INIA, Madrid On behalf of Michel Dumontier Maastricht University Discovering interconnected web APIs with semantic metadata
  2. 2. 2 @micheldumontier & @markmoby • Biomedical data analysis is increasingly being done using cloud-based, web-friendly application programming interfaces (APIs). • BUT it’s pretty much impossible to automatically discover which API to use and how to connect these together to create an effective workflow. Background
  3. 3. 3 @micheldumontier & @markmoby API Catalogs 17,202 APIs 1,187 APIs 6206 APIs 15,128 APIs SHARE Registry
  4. 4. 4 @micheldumontier & @markmoby Variable Metadata
  5. 5. 5 @micheldumontier & @markmoby Variable Metadata
  6. 6. 6 @micheldumontier & @markmoby Variable Metadata
  7. 7. 7 @micheldumontier & @markmoby Variable Metadata
  8. 8. 8 @micheldumontier & @markmoby
  9. 9. 9 @micheldumontier & @markmoby The parameter called “sequence” can have values that are FASTA formatted sequences
  10. 10. 10 @micheldumontier & @markmoby The average bioinformatician can traverse these links, read these API documents, and make reasonably good guesses about how to access the service But this is limited to the speed and patience of a human
  11. 11. 11 @micheldumontier & @markmoby Meanwhile, in another registry…
  12. 12. 12 @micheldumontier & @markmoby Variable Metadata
  13. 13. 13 @micheldumontier & @markmoby Variable Metadata Different metadata fields describing ~the same operation (BLAST)
  14. 14. 14 @micheldumontier & @markmoby Variable Metadata
  15. 15. 15 @micheldumontier & @markmoby Variable Metadata In this case, the parameter is called “QUERY”, and it can consume an Accession (…???...), a “GI”, or a FASTA formatted sequence
  16. 16. 16 @micheldumontier & @markmoby If you really work and dig-around A human can use Service Registries to find most of the information they need (though they still need experience and/or guesswork!)
  17. 17. 17 @micheldumontier & @markmoby Weak or absent input/output descriptors makes pipelining of services difficult based solely on registry metadata
  18. 18. 18 @micheldumontier & @markmoby Weak or absent input/output descriptors And even with ~well-described services pipelining remains troublesome
  19. 19. 19 @micheldumontier & @markmoby
  20. 20. 20 @micheldumontier & @markmoby myGene.info: Input parameters (described using the openAPI descriptor standard)
  21. 21. 21 @micheldumontier & @markmoby myGene.info: Input parameters (described using the openAPI descriptor standard) From the openAPI description, A bioinformatician can learn that the ‘geneid’ parameter can be an Entrez or EnsEMBL gene id…
  22. 22. 22 @micheldumontier & @markmoby myGene.info: Input parameters (described using the openAPI descriptor standard) Gene myGene.info
  23. 23. 23 @micheldumontier & @markmoby myGene.info: Input parameters (described using the openAPI descriptor standard) Gene myGene.info ?
  24. 24. 24 @micheldumontier & @markmoby myGene.info: Input parameters (described using the openAPI descriptor standard) Gene myGene.info JSON
  25. 25. 25 @micheldumontier & @markmoby GenBank identifier Affymetrix identifier Taxonomy identifier … 1340 lines … HGNC symbol ? NCBI Gene Terminology A big block of JSON! What do these symbols refer to? How do we find out more?
  26. 26. 26 @micheldumontier & @markmoby Two distinct problems: 1) Discovery of a tool that does what you need 2) Understanding how to use the tool you discovered • It’s inputs and outputs (what “kind” of information, and in what format/syntax, with which parameter names, required/optional?) • How it can be chained with other tools into more complex analytical workflows.
  27. 27. 27 @micheldumontier & @markmoby More contemporary registries get us closer…
  28. 28. 28 @micheldumontier & @markmoby “Crowdsourced” API registry (some curation) Features ontology-constrained fields
  29. 29. 29 @micheldumontier & @markmoby “Crowdsourced” API registry (some curation) Features ontology-constrained fields GUID
  30. 30. 30 @micheldumontier & @markmoby “Crowdsourced” API registry (some curation) Features ontology-constrained fields EDAM:operation_0346
  31. 31. 31 @micheldumontier & @markmoby “Crowdsourced” API registry (some curation) Features ontology-constrained fields EDAM:data_2044
  32. 32. 32 @micheldumontier & @markmoby “Crowdsourced” API registry (some curation) Features ontology-constrained fields EDAM:data_0857
  33. 33. 33 @micheldumontier & @markmoby “Crowdsourced” API registry (some curation) Features ontology-constrained fields No description of I/O parameters (for non-browser-based interaction) Description of data formats are sometimes available (and also grounded in EDAM ontology) but inconsistent Only possible to use this API registry for discovery, not for invocation (i.e. solves problem #1, but not #2) Also invented a novel Service Descriptor format  requires de novo tool-building
  34. 34. 34 @micheldumontier & @markmoby Semantic Health and Research Environment - SHARE - Registry (synopsis interface)
  35. 35. 35 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry Uses the myGrid Service descriptor (same as )
  36. 36. 36 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry Uses ontology terms for both data types and service operation types, much as with (but allows/encourages any ontology)
  37. 37. 37 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry SADI standardizes service interfaces such that the interface itself is also defined by these ontology terms (i.e. data must be owl:Individuals of the ontological type)
  38. 38. 38 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry …and therefore….
  39. 39. 39 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry Automated synthesis of, and invocation of, complex Service pipelines from independent providers
  40. 40. 40 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry Automated “gap filling” for unavailable data Automated detection of useful data combinations
  41. 41. 41 @micheldumontier & @markmoby Semantic Health and Research Environment (SHARE) Registry SADI assumes a world of 100% OWL/RDF data (Good) OWL can be quite hard to write!
  42. 42. 42 @micheldumontier & @markmoby Barely described No automation Hard to find and use Not “FAIR” Richly described Fully automatable Fully FAIR
  43. 43. 43 @micheldumontier & @markmoby Barely described No automation Hard to find and use Not “FAIR” Richly described Fully automatable Fully FAIR An incremental path to increasingly rich semantically-controlled metadata that Does not invent new standards and Is easy for our end-users to create
  44. 44. 44 @micheldumontier & @markmoby
  45. 45. 45 @micheldumontier & @markmoby The goal is to reduce the barrier for the discovery and reuse of web APIs through richer semantic metadata. i) a coordinated facility for the intelligent and facile annotation of smart APIs ii) a web application to discover smart APIs and how they connect to each other. 1 year supplement in collaboration with HeartBD2K center - Peipei Ping (PI), Andrew Su and Chunlei Wu. smartAPI
  46. 46. 46 @micheldumontier & @markmoby Build on API metadata specification standards SWAGGER
  47. 47. 47 @micheldumontier & @markmoby Tools for Intelligent API Metadata Authoring Build on CEDAR technology • Generate the Service metadata capture Web Form from a smartAPI template (CEDAR) • Discover context- appropriate annotation recommendations to enhance harmonization • Validate and give improvement suggestions
  48. 48. 48 @micheldumontier & @markmoby Metadata authoring will connect to numerous existing resources Identifier syntax and link outs475 ontologies and terminologies
  49. 49. 49 @micheldumontier & @markmoby
  50. 50. 50 @micheldumontier & @markmoby Smart Profiling
  51. 51. 51 @micheldumontier & @markmoby Smart Profiling (not the same as “Extreme Vetting” ;-) )
  52. 52. 52 @micheldumontier & @markmoby Using information from identifiers.org, MIRIAM, and prefix-commons, make some intelligent guesses about what a given data field might be  Enhanced suggestions for the end-user annotator
  53. 53. 53 @micheldumontier & @markmoby Use this to automatically map API data to Linked Open Data
  54. 54. 54 @micheldumontier & @markmoby Steps along the stairway…
  55. 55. 55 @micheldumontier & @markmoby Metadata Survey We performed a survey of 3 repositories (Biocatalogue, Programmable Web, Elixir Tools & Services Registry) and 4 specifications (MIAS, OPEN API, SADI, schema.org, and a preliminary smartAPI metadata specification).
  56. 56. 56 @micheldumontier & @markmoby Metadata Elements 20 basic, 6 provider, 10 operation, 12 parameters, 6 response
  57. 57. 57 @micheldumontier & @markmoby MUST • Name • Access Point SHOULD • Description • Documentation • Response MIME-Type • Terms of Service • Authentication Mode • Version • SSL Support MAY • Website • Category • Publications • API Access Restrictions • Access Point Mirrors • API Metadata Format • API Access Mode • API Location • API Implementation Language • API Maturity • Social Media Links
  58. 58. 58 @micheldumontier & @markmoby Metadata authoring made easier. We augmented the Swagger Editor to autocomplete using the smartAPI Repository and enabled validation against the smartAPI specification.
  59. 59. 59 @micheldumontier & @markmoby
  60. 60. 60 @micheldumontier & @markmoby Faceted Search Inteface. We implemented a lightweight web-based tool to perform faceted search and filtering over the elasticSearch repository of smartAPIs descriptions.
  61. 61. API Interoperability WG People Michel Dumontier Amrapali Zaveri Shima Dastgheib Chunlei Wu Ruben Verborgh Caty Chung Raymond Terryn Paul Avillach Gregg Kellogg Nolan Nichols http://mygene.info/ http://ruben.verborgh.or g/blog/2013/11/29/the- lie-of-the-api/ http://smart- api.info/website/ http://www.lincsproject.org/ http://bd2k- picsure.hms.harvard.edu https://spec-ops.io http://nidm.nidash.org/ Kevin Osborn David Steinberg https://cgl.genomics.ucsc.edu/ Mark Wilkinson Mary Shimoyama Jeff De PonsDenise Luna http://sadiframework.org https://bd2kccc.org/ http://rgd.mcw.edu/ Kathleen Jagodnik 61 @micheldumontier & @markmoby
  62. 62. 62 @micheldumontier & @markmoby • Facilitate the discoverability, interoperability, and reuse of web-based APIs – Eliminate API data silos by providing FAIR (Findable, Accessible, Interoperable, Reuseable) Linked Data. • The tools, technologies, and design patterns developed in the pilot and WG should generalize to API development across the BD2K consortium (and beyond). Take-home Message
  63. 63. 63 @micheldumontier & @markmoby • Michel Dumontier • Chunlei Wu • Cyrus Afrasiabi (backend, repository API) • Trish Whetzel (API profiling) • Yash Vyas (recommendation engine) • Amrapali Zaveri (metadata survey, template, web application, evaluation) • Andrew Su (evaluation) • Mark Wilkinson (evaluation) TEAM
  64. 64. michel.dumontier@stanford.edu Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier @micheldumontier & @markmoby markw@illuminae.com 64

×