BioIT Europe 2010 - BioCatalogue
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

BioIT Europe 2010 - BioCatalogue

on

  • 684 views

BioCatalogue presentation at BioIT Europe Hannover 2010 by prof Carole Goble

BioCatalogue presentation at BioIT Europe Hannover 2010 by prof Carole Goble

Statistics

Views

Total Views
684
Views on SlideShare
684
Embed Views
0

Actions

Likes
1
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • (Yes its Boring. But GOOD.) Silver bullets
  • Long tail middle two…. Long tail of consumers
  • In his visionary comment, Lincoln Stein called for standardization in bioinformatics, suggesting web services (http://www.w3.org/standards/webofservices) as the unifying platform for programmatic interfaces to tools and data sources (Stein, 2002). Nowadays, the ELIXIR project chooses SOAP web services for programmatic access to all considered bioinformatics databases and tools (http://www.elixir-europe.org/page.php?page=wp7). The Web Service Interoperability Organisation (WS-I, http://ws-i.org), supported by the main IT companies, constrains even more strictly the W3C's SOAP-service standards in order to maximize interoperability among the web services and the web-service programmatic libraries.
  • Same service can have two different implementations – one in each Reconciling Web Services and REST Services http://www.w3.org/2005/Talks/1115-hh-k-ecows/#(1) WADL: The REST answer to WSDL http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1265367,00.html Short for W eb S ervices D escription L anguage , an XML -formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI , an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM. Short for Re presentational S tate T ransfer is an architectural style for large-scale software design. REST was first articulated by Roy Fielding in his dissertation as: "REST emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. I describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. Finally, I describe the lessons learned from applying REST to the design of the Hypertext Transfer Protocol and Uniform Resource Identifier standards, and from their subsequent deployment in Web client and server software." [ Read the Dissertation ] Before we start, Let’s do a basic terminology headsup - SOAP refers to Simple Object Access Protocol HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc) Although, in the last few years we saw growth of large no. of Web Services, despite that the hype surrounding the SOAP has barely reduced. Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there’s a better method for building Web services in the form of Representational State Transfer ( REST ). REST is more of an old philosophy than a new technology. But a realization that came later in technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.
  • boutiques
  • Guessimate 3000+ Web Services in Life Science publicly available
  • Scientists are naughty Reuse is Hard … I used it last time so it will work again the same way…damn! change location, capabilities and signatures (BioMART changed its interface three times in 2006). new ones appear and existing ones disappear (SeqHound) they decay and become outdated or unreliable.
  • Sustainability Rich enough annotation Customisation Curation Community engagement Accessibility and availability Find Publish Understand Use Monitor Curate Archive Investigator and project specific registries EMBRACE, BioSapien, Stargate Portal Community lists Bioinformatics Links Directory, BioLinks, BioPlanet, Project specialist registries BioMOBY Central, DAS Registry, myGrid Registry, Sswap General catalogues and search engines SeekDa! , Web Services List, XMethods
  • Provide a single registration point for Web Service providers and a single search site for scientists and developers. Provide a curated catalogue of life science web services Providers, Expert curators and Users will provide oversight, monitor the catalogue and provide high quality annotations for services. BioCatalogue as a place where the community can find, contact and meet the experts and maintainers of these services. A means to pool metadata about services A means to discover and reuse services A means to curate services A platform for service monitoring and analytics A generic service annotation model for community annotation
  • Jazz Up? Provide a single registration point for Web Service providers and a single search site for scientists and developers. Provide a curated catalogue of life science web services Providers, Expert curators and Users will provide oversight, monitor the catalogue and provide high quality annotations for services. BioCatalogue as a place where the community can find, contact and meet the experts and maintainers of these services. A means to pool metadata about services A means to discover and reuse services A means to curate services A platform for service monitoring and analytics A generic service annotation model for community annotation
  • A means to pool metadata about services in the wild A means to discover and reuse those services A means to curate services A platform for service monitoring and analytics A supermarket
  • TO do the following….. Find Publish Understand Use Monitor Curate Retire
  • 11 chemistry web services (EBI, ChemSpider, PubChem) > 60 operations on chemistry and cheminformatics data.
  • Small but beautiful
  • Annotate the annotations
  • Search, Browse, Filter, Follow
  • Annotation platform Would like it to be an execution platform.
  • We need to provide comprehensive APIs to the registry Export & import standards WSDL, SAWSDL, SA-REST, WSMO …. RDF and SPARQL Web 2.0 Open REST interface Plugin & Mash up Open to Google URLs for Bookmarking Development model Perpetual beta User driven Biocatalogue Friends Open platform with open REST interfaces Web 2.0 site and development. Open source code base.
  • Ownership and responsibility
  • Sensitivities
  • Provide plenty of advance warning “ An interface is a contract between data provider and data consumer” Document interface; warn if it is unstable Do not make changes lightly - even little fiddly changes break things (like changing internal ids) When possible, maintain legacy interfaces until clients can port their scripts Support as many interfaces as you can HTML, Text only (better), HTTP, REST, SOAP Easy Interfaces + Power User Interfaces
  • Web Service Life Cycle. Our current versioning capabilities consist of monitoring changes to the WSDL, updating the existing entry in place and then adding entries to the change log. we asked users what would be useful and they mentioned that something like a change log / revision history would be very useful. S. J. Schultheiss et al. (2010) PLoS Comp Biol (in review) ‣ 64% of services used by researchers without computational background ‣ 58% of services developed by students only, difficult to maintain after graduation ‣ 24% of services will not be maintained External install-ability, they only care that it works on their own machines Services are in constant and often silent change. Dynamic and Unstable. Metadata decay (esp. on services instances). Workflow Decay. Monitoring and Repair. BioNanny. Implications for preservation not fossilisation. Implications for sustainability. The ENSEMBL database The Ensembl API is updated at every Ensembl release and needs to be used with only the same database version as the API - e.g. you can't use the 59 API on a 55 database or vice versa. It definitely is kept up-to-date and the latest version is 59. >>> >>> First of all, check that you are using the API code checked out from the version 59 branch of the Ensembl CVS. >>> >>> Then, if you are, it sounds like the Registry call you are making might simply be returning the wrong result. It might be worth you checking this with the Ensembl helpdesk (helpdesk@ensembl.org ). >>> >>> You could try loading each species explicitly to see if that fixes it. ( http://www.ensembl.org/info/docs/api/registry.html ) >>> >>> cheers, >>> Richard >>>
  • So this is an ongoing and dynamic live system Know your audience and environment Provide documentation and assistance Assist users and involve the community Use an existing framework In a standard best practice way Make it portable Be explicit about changes Leave a forwarding address Find someone else to do it Plan the end of the service life cycle S. J. Schultheiss et al. (2010) PLoS Comp Biol (in review) ‣ 64% of services used by researchers without computational background ‣ 58% of services developed by students only, difficult to maintain after graduation ‣ 24% of services will not be maintained External install-ability, they only care that it works on their own machines Services are in constant and often silent change. Dynamic and Unstable. Metadata decay (esp. on services instances). Workflow Decay. Monitoring and Repair. BioNanny. Implications for preservation not fossilisation. Implications for sustainability. The ENSEMBL database The Ensembl API is updated at every Ensembl release and needs to be used with only the same database version as the API - e.g. you can't use the 59 API on a 55 database or vice versa. It definitely is kept up-to-date and the latest version is 59. >>> >>> First of all, check that you are using the API code checked out from the version 59 branch of the Ensembl CVS. >>> >>> Then, if you are, it sounds like the Registry call you are making might simply be returning the wrong result. It might be worth you checking this with the Ensembl helpdesk (helpdesk@ensembl.org ). >>> >>> You could try loading each species explicitly to see if that fixes it. ( http://www.ensembl.org/info/docs/api/registry.html ) >>> >>> cheers, >>> Richard >>>
  • http://xml.nig.ac.jp/rest/Invoke?service={x}&method={y}&... http://xml.nig.ac.jp/{service}/{method}?... http://www.ebi.ac.uk/cgi-bin/dbfetch?db={db}&id={id}&format={f} http://www.ebi.ac.uk/dbfetch/{db_name}/{id}?format={f} http://www.myexperiment.org/workflow.xml?id={id} http://www.myexperiment.org/workflows/{id} Consistent implementation Structure of the URIs should be intuitively understandable and predictable; URLs and parameters should be self-descriptive Unambiguous use of URL parameter names If XML is used as data exchange standard, there must be an XSD schema Need for standards – for example, conformance to WADL Proper use of HTTP status codes and HTTP verbs Avoiding polymorphic services, where values of certain parameters determine the combination of other parameters that the service expects to find in URL Rest in Practice Savas Parastatidis http://restinpractice.com/default.aspx Reconciling Web Services and REST Services http://www.w3.org/2005/Talks/1115-hh-k-ecows/#(1) WADL: The REST answer to WSDL http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1265367,00.html Short for W eb S ervices D escription L anguage , an XML-formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI, an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM. Short for Re presentational S tate T ransfer is an architectural style for large-scale software design. REST was first articulated by Roy Fielding in his dissertation as: "REST emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. I describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. Finally, I describe the lessons learned from applying REST to the design of the Hypertext Transfer Protocol and Uniform Resource Identifier standards, and from their subsequent deployment in Web client and server software." [Read the Dissertation] Before we start, Let’s do a basic terminology headsup - SOAP refers to Simple Object Access Protocol HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc) Although, in the last few years we saw growth of large no. of Web Services, despite that the hype surrounding the SOAP has barely reduced. Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there’s a better method for building Web services in the form of Representational State Transfer (REST). REST is more of an old philosophy than a new technology. But a realization that came later in technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.
  • http://xml.nig.ac.jp/rest/Invoke?service={x}&method={y}&... http://xml.nig.ac.jp/{service}/{method}?... http://www.ebi.ac.uk/cgi-bin/dbfetch?db={db}&id={id}&format={f} http://www.ebi.ac.uk/dbfetch/{db_name}/{id}?format={f} http://www.myexperiment.org/workflow.xml?id={id} http://www.myexperiment.org/workflows/{id} Consistent implementation Structure of the URIs should be intuitively understandable and predictable; URLs and parameters should be self-descriptive Unambiguous use of URL parameter names If XML is used as data exchange standard, there must be an XSD schema Need for standards – for example, conformance to WADL Proper use of HTTP status codes and HTTP verbs Avoiding polymorphic services, where values of certain parameters determine the combination of other parameters that the service expects to find in URL Rest in Practice Savas Parastatidis http://restinpractice.com/default.aspx Reconciling Web Services and REST Services http://www.w3.org/2005/Talks/1115-hh-k-ecows/#(1) WADL: The REST answer to WSDL http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1265367,00.html Short for W eb S ervices D escription L anguage , an XML-formatted language used to describe a Web service's capabilities as collections of communication endpoints capable of exchanging messages. WSDL is an integral part of UDDI, an XML-based worldwide business registry. WSDL is the language that UDDI uses. WSDL was developed jointly by Microsoft and IBM. Short for Re presentational S tate T ransfer is an architectural style for large-scale software design. REST was first articulated by Roy Fielding in his dissertation as: "REST emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. I describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. Finally, I describe the lessons learned from applying REST to the design of the Hypertext Transfer Protocol and Uniform Resource Identifier standards, and from their subsequent deployment in Web client and server software." [Read the Dissertation] Before we start, Let’s do a basic terminology headsup - SOAP refers to Simple Object Access Protocol HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc) Although, in the last few years we saw growth of large no. of Web Services, despite that the hype surrounding the SOAP has barely reduced. Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there’s a better method for building Web services in the form of Representational State Transfer (REST). REST is more of an old philosophy than a new technology. But a realization that came later in technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.
  • If you type 'togows' you'll find a number of services with unreasonable number of operations (form 72 to 369). They make annotation and usage very difficult. Each soap operation has a URL which is an endpoint Soap uses web as a transport protocol Each service ha a base url/endpoint, this is monitored along with the WSDL URL Services vs Operations As mentioned, one of the biggest issues is: the people who build the services have a lot of implicit and assumed knowledge that they then don't share with the people consuming the service (or don't share it in an accessible way). The big providers tend to have very long documentation pages / user guides that are sometimes hard to use as a "quick start".
  • So we need to describe not just the interface but the behaviour for user function abstractions – goals. Operations orchestration or pattern based Asynchronous service Server like services (e.g soaplab) Service in the wild worse than we think…we’ve come across these different type of service. Multiple operation->1 task: by annotating these services on individual operation, a gap remains between the users perspective of service operations as tasks with a well-defined function and service providers’ technological view. We argue that this gap can be filled by choosing to annotate at a higher level of abstraction => that’s what we name the FU KEGG: Kyoto Encyclopedia of Genes and Genomes
  • Asynchronous pattern
  • Because the services have been ripped from their Orchestration Fabric
  • User perspective vs execution/developers perspective To clearly annotate web service we need another layer of abstraction independent to the technology used. In this presentation a number of example to define the FU The work presented here stems from the observation that current annotation models force users to think in term of service interface rather than high level functionality FU: the elementary units of information used to describe a service. Using widely used web service in Life Science we define the FU as configurations and compositions of underlying service operations. FU is limited to the set of operations that are part of the same service.
  • indicative of a poor service interface design or to perform a complex query using data from a single database. While this would not be surprising when trying to connect operations from heterogeneous services, single-service workflows that require adapters seem indicative of a poor service interface design or to perform a complex query using data from a single database. The following example illustrates one of these complex composite FUs involving SABIO-RK. Fig. 4(a) shows a composite FU as an ideal sequence of processors. The purpose of this biochemical FU is to find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions. This is an ideal workflow in the sense that it “skips over” the adapters that are required to make the data pipeline work in practice for identifying chemical reactions for a given set of metabolites using data within SABIO-RK. The relevant fragment of the actual workflow is shown in Fig. 4(b). The additional processors are scripts that perform local data manipulation (in this case, set intersection, parsing of lines in a text file). When these composite functional units are properly annotated, the significant effort required for their design translates into high added value for third party users who discover them through BioCatalogue.
  • Just sticking the Java API out there is harsh
  • http://en.wikipedia.org/wiki/WS-Security
  • Team curation – make it fun! environment in which to bring providers, consumers and experts together, maybe through the use of *discussions* . I'm not sure this is happening. I.e: the social element. Are providers too afraid to throw themselves out there a bit more? Are consumers quickly getting turned off using certain services because they are hard to use and there is no one to contact about this? Things like BioStar (http://biostar.stackexchange.com/) seem to be very active with questions/answers. But this is disjoint from the actual information on the web services (maybe this is just how the web tends to organically work anyways?) More to come as I think about it. Jits If you build it they will not come. we have done little on getting people to come to the site to make comments - the workflow is find it, download it, bye The front page - I don't know where to do to discuss anything. There isn't a discuss button. where would you start? if biocatalogue was integrated into bioeclipse and you could comment from there that might help. Or from eclipse in general. why would the providers care? BioCatalogue is not a social web site
  • Figuring out a stranger’s web service is very hard Attribution Curating is hard Third party is hard Responsibility and ownership Reward and credit and downloads/access for providers

BioIT Europe 2010 - BioCatalogue Presentation Transcript

  • 1. The Reality of Web Services in the Life Sciences Professor Carole Goble [email_address] University of Manchester, UK myGrid Project BioIT World Europe 2010, Hannover http:// www.biocatalogue.org
  • 2. Web Services
    • Programmatic Interfaces to Services.
    • Machine-Machine communication
    • Software Lego™ that works across the web and underpins enterprise SOA.
    • Standard interfaces.
    • Two big families:
      • SOAP and REST.
  • 3. Programmatic Interfaces to Services on the up…..
    • Specialisation and segregation of methods from monolithic servers.
    • Component packaging.
    • Publishing data and analyses.
    • Tools / resources integration.
    • Applications, analytic workflows, workbenches and enterprise platforms
    • Agile software development
    • Remote and in house execution
    • Loosely coupled systems .
    http://www.myexperiment.org/workflows/158.html
  • 4. Service Providers and Consumers
    • Core facility (EMBL-EBI, DDBJ , NCBI …)
    • EMBL-EBI 8-10million hits/month
    • 329 services
    • Community projects and labs
    • Single Investigator projects
    • Enterprises (e.g. Pharmas)
    Public Private
  • 5. Web Service Rhetoric
    • Pistoia Alliance
    • BioIT Alliance
    • ELIXIR
    • But not all rosy … see Christian Hauck’s talk 16.00 Thursday.
  • 6. Web Service Technology Standards
    • Simple Object Access Protocol
      • Remote Procedure Call based
      • HTTP transport protocol only
      • Web Service Description Language in XML, UDDI registry
      • Extensible
    • Representational State Transfer
      • Resource (document) style
      • HTTP and URI application protocol
      • XML and JSON responses, usually
      • GET / PUT / POST
      • Lightweight, webby
  • 7. Bio Service Special Flavours
    • Distributed Annotation Services ( www.biodas.org )
    • BioMOBY ( www.biomoby.org )
    • SADI
    • SSWAP (iPlant Collaborative)
  • 8. Reusing Public and Third Party Web Services Where… can I find them? advertise mine? What… do they do? can I use them? How… do they work? up to date? reliable? Who… provides them? recommends them? knows about them?
  • 9. Web Service Description Language
    • <wsdl:message name=&quot;getGlimmersResponse&quot;>
    • <wsdl:part name= &quot; getGlimmersReturn &quot; type=&quot;xsd:string&quot;/>
    • </wsdl:message>
    • <wsdl:message name=&quot;aboutServiceRequest&quot;/>
    • <wsdl:message name=&quot;getGlimmersRequest&quot;>
    • <wsdl:part name=&quot;in0&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in1&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in2&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in3&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in4&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in5&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in6&quot; type=&quot;xsd:string&quot;/>
    • <wsdl:part name=&quot;in7&quot; type=&quot;xsd:int&quot;/>
    • <wsdl:part name=&quot;in8&quot; type=&quot;xsd:string&quot;/>
    Pathport Web service from the Virginia Bioinformatics Institute http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsd Name of the service Uninformative names for parameters What kind of string?
  • 10. Services In the Wild
    • Find
    • EMBOSS clustalw program called ‘emma’
    • Execute
    • SOAP / REST / Quasi-REST / REST-like
    • Understand
    • Input0:string, Output0: string
    • What does SeqRet actually do?
    • Example data? Parameter configurations? Input-Output correlations?
    • Use
    • Quality of Service, Monitoring, Robustness
    • Volatility, Sustained, License, Conditions of Use
  • 11.  
  • 12. Cataloguing to avoid reinvention
    • Investigator and project specific registries
    • Community lists
    • Specialist registries
    • General catalogues and search engines
  • 13. An Open, Public, Curated, Boutique Catalogue for Web Services serving the Life Sciences for the Bioinformatics Community http://www.biocatalogue.org Launched June 2009 Nucl Acids Res, June 2010, Web Servers issue doi : 10.1093/nar/gkq394
  • 14.  
  • 15.  
  • 16. Lets Pool
    • A reliable, trusted, up to date and sustained catalogue customised for the Life Sciences.
      • EMBL-EBI service commitment
    • Discover and use services.
    • Community and specialist curation.
      • Pooled annotations and experiences.
      • A platform for service monitoring and analytics.
    • A resource, A Web Service
      • Incorporate into applications.
    • Buildng a community of providers and consumers together
  • 17. UNDERSTAND and USE
  • 18. Service Coverage
    • 1719 services – SOAP and REST
      • 92% with service description
      • 57.5% with all ops/methods described
    • >60 classifications
    • Big players: EBI, NCBI, DDBJ etc….
    60 operations on chemistry and chem-informatics data Protein Seq. Alignment Protein Structure Prediction Protein Function Prediction Nucleotide Seq . Alignment Rna structure prediction Gene Prediction Text Mining Ontology Phylogeny Microarray Sequence Retrieval Identifier Retrieval Structure Retrieval Literature Retrieval Genomics Proteomics Systems Biology Biostatistics Chemoinformatics
  • 19. [June 09 - Sep10] Steady use: 2K+ unique IPs/month.
  • 20.
    • Chiefly public services
    • Community contributed
      • Service Providers: 127
      • Third Parties: 92 submitters
      • 420 registered members
      • 27 countries (UK>Spain>USA>Canada)
    • Partners and registries
      • EMBRACE Registry, SeekDa!, (BioMOBY, DAS)
    • Automated crawling
    • Manual mining
    Building Content and Community
  • 21. EMBL-EBI DDBJ NCBI But these statistics have to be interpreted…..
  • 22. Annotations
    • Bio-Services
    • EDAM
    • myGrid
    • BioMOBY…
    • Bioontologies
    • OBO Foundry
    • BioPortal…
    • Services
    • WSMO
    • SAWSDL
    • SA-REST…
    Curation Change logs Quantitative Annotations Tags Semantic Annotations Ontologies Functional Capabilities Provenance Operational Capabilities Operational Metrics Use Policy Social Status Ratings Attribution Free text Instrumentation Usable and Useful Understand able
  • 23. Incremental Annotation 50,672
    • accumulate, aggregation, types, attribution
  • 24. Archived Service Annotations Attribution
  • 25. Tagging Social Annotate Anything Categories
  • 26. Operations Inputs Outputs Example use
  • 27.
    • Availability
    • API changes
    • Test script sandbox
    • Based on EMBRACE Registry Monitoring Framework
    Social Sharing Feeds
  • 28.
      • WSDL, SAWSDL, SA-REST, WSMO
      • RDF and SPARQL
    Service annotation formats Gadgets, Apps Customised and Private instances A service / resource Open Source (BSD) Open Platform Read (Write) REST APIs
      • EDAM, BioMOBY, myGrid, OBO family, BioXSD
    Annotation Ontologies
  • 29. Content Capture & Curation People Powered Content Reward and Attribution Sensitivities Tools Bringing a Community together Automation Core Contribution & Curation Coordination Governance
  • 30. Governance Blackhole
    • Submission
    • Content
    • Ownership / submitter / curator responsibilities
    • Responsibility migrations
    • Service update
    • Metadata update
    • Notifications
    • Withdrawal
    • Take-down
    • Archiving
    • Preservation
  • 31. Curating third party services is HARD The Reality of Web Services in the Life Sciences The Reality of (Expert) Crowd Sourcing Contributions for a Web Service Catalogue
  • 32. Eight years ago Lincoln Stein said… “ An interface is a contract between data provider and data consumer” Stein L Creating a bioinformatics nation. Nature 2002;417:119-120.
  • 33. A Public interface means a Public Service
    • Thinking local not global
      • Local configuration bake-ins
      • Scalability – I/O and load
      • Interface granularity and interaction chattiness
    • Interface churn
      • Silent API volatility
      • BioCatalogue Change logs
      • Web Interface trumps API
      • Local application trumps dependent external ones
    Ensembl API: updated on every release, not backward compatible with obscured versioning. BioMART: exposed internal identifier formats and then changed them.
  • 34. (Public) Service Sustainability
    • Staff/funding/project churn
    • 2 year availability, responsibility migration/hole, service decay -> application decay
    • 58% developed by students, 24% stated not maintained
    • (Schultheiss et al. (2010) PLoS Comp Biol (in review))
    • 146 services archived, >90% availability
    Sustainability strategy Make it portable, Provide documentation Use existing frameworks and practices Involve the community and know your users Plan sunset or migration Funding models for sustainability Preserv ation
  • 35. Schultheiss et al. (2010) PLoS Comp Biol (in review)
  • 36. Geek Usability Quasi-Standards
    • http://xml.nig.ac.jp/rest/Invoke?
    • service={x}&method={y}&...
    • Which service? Need to know precisely what is expected for every service at the same endpoint
    • http://xml.nig.ac.jp/{service}/{method}?...
    • Service-method pairs
    y like http://BASE/op?parameter={value}
  • 37. Geek Usability Quasi-Standards
    • http://www.ebi.ac.uk/cgi-bin/dbfetch?db={db}&id={id}&format={f}
    • Service specified. But db name and is in query.
    • RPC in REST
    • http://www.ebi.ac.uk/dbfetch/
    • {db_name}/{id}?format={f}
    • A resource with an id in a database
    y like http://BASE/op?parameter={value}
  • 38. Usability: The What and How are Implicit knowledge
    • No or lots of docs, poor examples
    • Complexity
    • Interfaces and Operation
    • Service families
    Service Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation Input Output Parameters Errors
  • 39.  
  • 40. Behaviour Families
    • Function: Each single operation performs a single domain related task
      • e.g. KEGG or TFmodeller
    • Polymorphic: A single operation performs multiple domain related tasks
      • e.g. searchSimple operation in BLAST DDBJ
    • Patterns: Multiple operations combine to perform a single domain related task
      • e.g. InterProScan (EBI)
    • Wrapper pattern: Service wraps a “metaserver” rather than the services themselves.
      • e.g. RapidMiner, Soaplab
  • 41. Behaviour families Function Polymorphic Patterns e.g. KEGG, TFmodeller e.g. searchSimple operation in BLAST DDBJ
      • e.g. InterProScan (EBI), RapidMiner, Soaplab Server
    Domain Tasks Invocable operations
  • 42. Polymorphic One operation multiple functional units
    • BLAST (DDBJ)
    • 1 Operation: searchSimple
    • 5 Functional units
    searchSimple PD: protein sequence database ND: nucleotide sequence database query database program proteinBlast blastp protein PD nucleotideBlast blastn nucleotide ND proteinNucleotideBlast tblastn nucleotide ND nucleotideProteinBlast blastx protein PD nucleotideBlastFrameTranslation tblastx nucleotide ND
  • 43. Server Wrapper Pattern
    • SOAPLab services operations
    • clear | describe | getLastEvent | getResults | getResultsInfo | getStatus | run | runAndWaitFor | terminate | waitfor |
    • All 100 or so services have same WSDL document.
  • 44. The SOAP/REST technical view over services is not enough Need a functional / task-oriented view
  • 45. Functional Unit annotation
    • Service description abstraction
    • Services as functional tasks
    • Within the boundary of a service
    • Independent from technology used
    Service Operation Operation Operation Operation Operation Operation Operation Operation Operation Operation WSDL REST DAS [Missier, et al 2010 Functional Units: Abstractions for Web Service Annotations]
  • 46. Complexity because it’s a database really SABIO–RK Service only Taverna workflow find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions.
  • 47. Reflections
    • Writing reusable, reliable (public) services with good and stable interfaces for others is hard
    • A service interface is different to a web interface or a database query interface.
    • Public interfaces – internal interfaces mismatch
    • Publishing an interface is a publishing step.
    • Technologist – User mismatch
    • Eat your own dog food
    • Takes resource, time and trouble
    • But will pay off! We can’t afford to reinvent.
  • 48. Enterprise Concerns: real or perceived?
    • Security
      • HTTPS trusted peers inside a firewall
      • WS-Security and OAuth (REST)
      • Or is it fear of using external data?
    • Performance
      • Signature granularity and chattiness
      • Data shipping vs reference shipping
      • XML and JSON are not the only formats
    • Governance
      • Service Level Agreements
    Technical or social issues?
  • 49. Collaborative Curating
    • Socialising the community
    • Rewarding contributors
    • 10:90 long tail rule
    • Content feedback spiral
    • Feedback sensitivities
    • Reputation protection
    • Widen - Smart application feeds
    • Resourced core content team
  • 50. Cost of Crowd Curation
  • 51. Take home
    • Emerging, evolving, exciting and challenging Web service ecosystem
    • BioCatalogue draws together services, knowledge and community to provide intelligence.
    • Crowd collaboration to scale contribution, core to coordinate
    • Open effort – contribute or adopt
    • Core resource – for Alliances and Journals
    • Social + technical challenges
    • Christian Hauck’s talk 16.00 Thursday.
  • 52. Credits Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Steve Pettifer Katy Wolstencroft Robert Stevens David De Roure Mannie Tagarira Jerzy Orlowski Sergejs Aleksejevs Rodrigo Lopez Eric Nzuobontane
  • 53.  
  • 54. Thank You http:// www.biocatalogue.org About Us - http:// wiki.biocatalogue.org API Docs - http:// apidocs.biocatalogue.org 11th July 2010 ISMB 10 Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.:  BioCatalogue: a universal catalogue of web services for the life sciences , Nucl. Acids Res., 2010.  doi:10.1093/nar/gkq394