Biocatalogue Talk Slides

BioCatalogue Joined project: Aim: Create a registry of annotated biological web services & Funded by:

Timeline and Approach Started 1 st June 6 months Pilot 1 Perpetual beta “ BioCatalogue-Friends” focus group Extensible software Built to be evolved and to be scaled.

In the Wild Cloud Data Services Major data centres EMBL-EBI, UK, DDBJ, Japan , NCBI, USA , PDBJ, Japan Smaller projects and databases o Kanehisa Laboratory, Kyoto, Japan o myGrid, Manchester, UK o BASIS, University of Newcastle, UK o Biomolecular Interaction Network Database, BIND, University of Toronto, Canada o GeneCruiser, Broad Institute, Harvard-MIT, USA o Genomics and Bioinformatics Group: Lab of Molecular Pharmacology, USA o BioMoby o Virginia Bioinformatics Institute, USA o Center for Biological Sequence Analysis, CBS, Technical University of Denmark o Helmholtz open bioinformatics technology, Germany o Information Hyperlinked over Proteins, iHOP o SIGENAE project, France o The Nottingham Arabidopsis Stock Centre, NASC, UK o Bioinformatics Competence Center Braunschweig, Germany o Gene Ontology visualisation, Goviz o Bioinformatics group, Italy o The National Centre for Text Mining, NaCTeM o Centro de Ciencias Genómicas, UNAM, Mexico o e-Fungi, Manchester, UK o FUGE bioinformatics platform, Norway o Institute of Bioinformatics, Tsinghua University, China o EMAP, Edinburgh Mouse Atlas Project, UK o The Chemical Informatics and Cyberinfrastructure Collaboratory (CICC), Indiana University, US, ChemSpider http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices Variable sustainable stewardship

Digital Curation is… … . about maintaining and adding value to a trusted body of digital assets for current and future use by, and on behalf of, a community . … . a long term process where those assets are managed, cleaned up and corrected, associated with metadata, annotated and discussed, and appropriately preserved or reliably disposed of. … . about enabling assets to be effectively found, understood, and reused in anticipated and unanticipated ways by those who created them, by those who did not, by their home community, and by alien communities. http://en.wikipedia.org/wiki/Digital_curation

Curate Processes A repository A means to pool, discover and reuse workflows A means to curate workflows A platform for workflow monitoring and analytics A registry A means to pool metadata about services in the wild A means to discover and reuse those services A means to curate services A platform for service monitoring and analytics

Service and Workflow analytics and network analysis Recommendations and co-use. Social networks of third party externally hosted services Automated diagnostics, monitoring and metadata curation

Finding and Curating Services http://www.biocatalogue.org Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies. Drawing on experience at EBI in service provision. First pilot early November 2008, will cover major providers (EBI, NCBI, DDBJ) at “bronze” quality and show some at platinum.

Web Services in the Wild Findable? The clustalw program from Emboss is called ‘emma’ Executable? WSDL / WADL / W*DL Other kinds of services? Understandable? Input0:string, Output0: string What does the polymorphic SeqRet actually do? Example data? Parameter configurations? Input-Output correlations? Poorly documented black boxes. Usable? Quality of Service, monitoring, robustness Stability and dependability Licensing

Writing Reusable stuff is DIFFICULT Predicting the unknown required by the unknown. Scientists and Developers are under pressure and naughty.

Services Mutability and Preservation Services are in constant and often silent change. Dynamic and Unstable. Metadata decay (esp. on services instances). Workflow Decay. Monitoring and Repair. BioNanny. Implications for preservation not fossilisation. Implications for sustainability.

Workflows and Services Curation by Experts Social Curation by the Crowd refine validate refine validate Self-Curation by Contributors seed seed refine validate seed refine validate seed Automated Curation

Multiple Annotation Profiles User Profile Service Profile Profile Annotation Profile Annotation Profile Annotation Ranking Functions Group Profile

Service Profile Curation Model Quantitative Content Tags Service Model Semantic Content Model Ontologies Functional Provenance Operational Operational Metrics Conditions of Use Social Standing 6 facets Versioning QoS Usage

A.N. Other Execution at Host Service Profile Finding WSDL WADL S-A.N. Other SAWSDL SA-REST Analytics Ranking Browse/Shop Search Customised Services Workflows Monitoring Profiles Curation Quant’ve Service Model Semantic Content Model

Service Profile Facets Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance

Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ontologies Controlled Vocabularies Tags Free text Folksonomies Standards W*DL Atom Schemas

Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ranking

Pay as you Go, Emergent Curation Just enough, Just in Time, not Just in Case. What is the Return for the Investment? Gain Pain Very BAD Good, but Unlikely Just right Folksonomy Tagging Hard Core full on Ontology Curation Rich enough metadata for effective reuse

Scientist – Finding. Simple metadata on a few properties. Smart tools. “Coarse grained”. Decision Support. Simple Ontologies. Folksonomies. Indexing. Matching. Automation – Composition, Validation and Execution. Rich metadata for automatic service configuration, invocation, debugging, repair, automated composition Decision making. Rich ontologies. Reasoning. Scientist – (Re)Using . Richer metadata explanation on the inputs, outputs and each operation.

3500+ service operations 700+ annotated by full-time curator. Feta and Find-O-Matic discovery tools

BioCatalogue: The pilot Features: User Registration Service Registration Search Annotation Notification Integration with myExperiment Keep it simple

Roadmap – Perpetual Beta Services BioMoby and Embrace support Support for REST services Operational Metrics Service monitoring Notifications “ Test a service” Discovery Enhancing search functionality Semantic search Facetted Browsing a la Amazon Customised ranking Curation Semantic annotation Usage metrics collection Improved user interfaces Third Party integration REST APIs Third party scavenging and monitoring – SeekDa!, BioMOBY myExperiment integration

Importers Importers Ontology Editor Ontologist BioCatalogue Catalogue Manager Service Providers Service Provider Workbench Domain Services Bio Web Services Extraction Importers Curator Workbench Expert Curator Chameleon change handler Discovery Service EB-eye Search Scientists Ontology Exporter Curation and Acquisition Tools Discovery Services Backend Catalogue Services Ontology Services “ Shopping” Web Interface Find-O-Matic Auto Annotation Advanced Finding Web Service Interface BioNanny Monitor Reviewing Feedback Blogging Tags Service Providers Tool Developers Web Browser Tool Developers Tags Community analysis Service analysis Community Use Monitor Community Tools + Tags Scientists EB-eye Ranking Matching

Sister Project Close partnership Social Curation Shared Code

Finding, curating and reusing workflows Connecting Scientists in the Wild A supermarket for workflow users. A toolbox for workflow creators. Social networking over commodities. Different disciplines. 1200+ members from 114 countries. 50000+ workflows downloads. 1500-2000 unique visitors / month 460+ workflows. 98 groups. 35+ packs. Running for just over a year. Joint Manchester and Southampton. Project leader: Prof David De Roure

Workflows, simulations, scripts, experimental plans statistical models, ... Bottom up e-Science repository for Scientific Research Objects Sharing to propagate expertise and build reputation. Collaboration. Towards reusable and comparable research. , http://myexperiment.org

Open and off the shelf….. … . Open to workflow systems (Taverna, Trident, BPEL…) … . Open to voluntary added applications. … . Web Services and scripts … . Browser mashups … . Applications and tools … . User’s environments Google Gadget Web 2.0 protocols, Open Archive Initiative, Linked Open Data, RESTful APIs, Global, persistent URIs

More Information BioCatalogue website http://www.biocatalogue.org/ BioCatalogue wiki http://www.biocatalogue.org/wiki myGrid website http://www.mygrid.org.uk/

BioCatalogue Team Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Rodrigo Lopez Eric Nzuobontane

Curation Sweatshop Steady increase in numbers of services and workflows Users able to find annotates services BUT Time-consuming and expensive. More and more services built daily SO We should enable suppliers to add value We should get users involved

Biocatalogue Talk Slides

More Related Content

What's hot

Similar to Biocatalogue Talk Slides

Biocatalogue Talk Slides

Editor's Notes