Reusing Public and Third Party Web Services Where… can I find them? advertise mine? What… do they do? can I use them? How… do they work? up to date? reliable? Who… provides them? recommends them? knows about them?
Pathport Web service from the Virginia Bioinformatics Institute http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsd Name of the service Uninformative names for parameters What kind of string?
Services In the Wild
EMBOSS clustalw program called ‘emma’
SOAP / REST / Quasi-REST / REST-like
Input0:string, Output0: string
What does SeqRet actually do?
Example data? Parameter configurations? Input-Output correlations?
Quality of Service, Monitoring, Robustness
Volatility, Sustained, License, Conditions of Use
Cataloguing to avoid reinvention
Investigator and project specific registries
General catalogues and search engines
An Open, Public, Curated, Boutique Catalogue for Web Services serving the Life Sciences for the Bioinformatics Community http://www.biocatalogue.org Launched June 2009 Nucl Acids Res, June 2010, Web Servers issue doi : 10.1093/nar/gkq394
A reliable, trusted, up to date and sustained catalogue customised for the Life Sciences.
EMBL-EBI service commitment
Discover and use services.
Community and specialist curation.
Pooled annotations and experiences.
A platform for service monitoring and analytics.
A resource, A Web Service
Incorporate into applications.
Buildng a community of providers and consumers together
UNDERSTAND and USE
1719 services – SOAP and REST
92% with service description
57.5% with all ops/methods described
Big players: EBI, NCBI, DDBJ etc….
60 operations on chemistry and chem-informatics data Protein Seq. Alignment Protein Structure Prediction Protein Function Prediction Nucleotide Seq . Alignment Rna structure prediction Gene Prediction Text Mining Ontology Phylogeny Microarray Sequence Retrieval Identifier Retrieval Structure Retrieval Literature Retrieval Genomics Proteomics Systems Biology Biostatistics Chemoinformatics
EMBL-EBI DDBJ NCBI But these statistics have to be interpreted…..
Curation Change logs Quantitative Annotations Tags Semantic Annotations Ontologies Functional Capabilities Provenance Operational Capabilities Operational Metrics Use Policy Social Status Ratings Attribution Free text Instrumentation Usable and Useful Understand able
Incremental Annotation 50,672
accumulate, aggregation, types, attribution
Archived Service Annotations Attribution
Tagging Social Annotate Anything Categories
Operations Inputs Outputs Example use
Test script sandbox
Based on EMBRACE Registry Monitoring Framework
Social Sharing Feeds
WSDL, SAWSDL, SA-REST, WSMO
RDF and SPARQL
Service annotation formats Gadgets, Apps Customised and Private instances A service / resource Open Source (BSD) Open Platform Read (Write) REST APIs
EDAM, BioMOBY, myGrid, OBO family, BioXSD
Content Capture & Curation People Powered Content Reward and Attribution Sensitivities Tools Bringing a Community together Automation Core Contribution & Curation Coordination Governance
Ownership / submitter / curator responsibilities
Curating third party services is HARD The Reality of Web Services in the Life Sciences The Reality of (Expert) Crowd Sourcing Contributions for a Web Service Catalogue
Eight years ago Lincoln Stein said… “ An interface is a contract between data provider and data consumer” Stein L Creating a bioinformatics nation. Nature 2002;417:119-120.
A Public interface means a Public Service
Thinking local not global
Local configuration bake-ins
Scalability – I/O and load
Interface granularity and interaction chattiness
Silent API volatility
BioCatalogue Change logs
Web Interface trumps API
Local application trumps dependent external ones
Ensembl API: updated on every release, not backward compatible with obscured versioning. BioMART: exposed internal identifier formats and then changed them.
(Public) Service Sustainability
2 year availability, responsibility migration/hole, service decay -> application decay
58% developed by students, 24% stated not maintained
(Schultheiss et al. (2010) PLoS Comp Biol (in review))
146 services archived, >90% availability
Sustainability strategy Make it portable, Provide documentation Use existing frameworks and practices Involve the community and know your users Plan sunset or migration Funding models for sustainability Preserv ation
Schultheiss et al. (2010) PLoS Comp Biol (in review)
Geek Usability Quasi-Standards
Which service? Need to know precisely what is expected for every service at the same endpoint
The SOAP/REST technical view over services is not enough Need a functional / task-oriented view
Functional Unit annotation
Service description abstraction
Services as functional tasks
Within the boundary of a service
Independent from technology used
Service Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation WSDL REST DAS [Missier, et al 2010 Functional Units: Abstractions for Web Service Annotations]
Complexity because it’s a database really SABIO–RK Service only Taverna workflow find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions.
Writing reusable, reliable (public) services with good and stable interfaces for others is hard
A service interface is different to a web interface or a database query interface.
Public interfaces – internal interfaces mismatch
Publishing an interface is a publishing step.
Technologist – User mismatch
Eat your own dog food
Takes resource, time and trouble
But will pay off! We can’t afford to reinvent.
Enterprise Concerns: real or perceived?
HTTPS trusted peers inside a firewall
WS-Security and OAuth (REST)
Or is it fear of using external data?
Signature granularity and chattiness
Data shipping vs reference shipping
XML and JSON are not the only formats
Service Level Agreements
Technical or social issues?
Socialising the community
10:90 long tail rule
Content feedback spiral
Widen - Smart application feeds
Resourced core content team
Cost of Crowd Curation
Emerging, evolving, exciting and challenging Web service ecosystem
BioCatalogue draws together services, knowledge and community to provide intelligence.
Crowd collaboration to scale contribution, core to coordinate
Open effort – contribute or adopt
Core resource – for Alliances and Journals
Social + technical challenges
Christian Hauck’s talk 16.00 Thursday.
Credits Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Steve Pettifer Katy Wolstencroft Robert Stevens David De Roure Mannie Tagarira Jerzy Orlowski Sergejs Aleksejevs Rodrigo Lopez Eric Nzuobontane
Thank You http:// www.biocatalogue.org About Us - http:// wiki.biocatalogue.org API Docs - http:// apidocs.biocatalogue.org 11th July 2010 ISMB 10 Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: BioCatalogue: a universal catalogue of web services for the life sciences , Nucl. Acids Res., 2010. doi:10.1093/nar/gkq394