Biocatalogue Talk Slides


Published on

BioCatalogue talk by Carole Goble. She outlines in these slides the reasons behind the BioCatalogue project. And present the BioCatalogue and its goals.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The plan for this talk was to highlight what BioCatalogue is and to Give a demo but unfortunately can’t do it because not ready. But will use some screen shot to show you what really going on or what to Expect next from BioCatalogue. Background of the talk: Lots of database and data resources Feta but can’t annotate all the services BioCatalogue
  • Services are methods too.
  • Fix, File and Forget is curation in a way….. Assets are used, we hope By applications and scientists who had anticipated using them. By applications and scientists that had not, or in ways that were unanticipated.
  • Of course it isn’t as clean as that. And highly interrelated.
  • Workflows are combinations of services. External Not self-contained or isolated Service and Workflow analytics and network analysis Service Diagnostics and monitoring Automated curation
  • Get service providers involved, get the community involved 3500+ service operations, but only 700ish annotated in Feta. myGrid Service Ontology Annotation and curation pipeline Curation and Discovery tools Other registries: DAS Registry, BioMOBY Central, SeekDa …
  • Scientists are naughty Reuse is Hard We have to try them to find out what they do… IVOA referred to this too. … I used it last time so it will work again the same way…damn! change location, capabilities and signatures (BioMART changed its interface three times in 2006). new ones appear and existing ones disappear (SeqHound) they decay and become outdated or unreliable.
  • Services in the Wild are frequently, er, disappointing and hard to use. (Rubbish ™) . Writing reusable workflows is hard. Local services Permissions. Licences What does it DO? Writing reusable services is hard. What does it DO? Predicting the unknown required by the unknown. Finding workflows, services and tools is hard Where do you go?? What does it DO?? Creating web services is still a bottleneck. For quick solutions it is still seen as too much extra trouble.
  • Ruin Not fix, file, forget Services are not deposited and preserved in software libraries. Rapid metadata heart-beat, especially on operational metadata. Could use previous slide in DCC talk. Shadows Method archives Shadows – what it was that can be used again. They are referred to. No SLA to be stable or standard. Constantly need tending or else they go stale. (cf. IVOA service validation, DAS). Not software libraries BioNanny – using Grid tools Versioning of workflows – Andrea. Regular health checks Use myExperiment to notify scientists with potential problems Use myExperiment to be smart about which services should be monitored. Workflows are deposited but…. Not self-contained. Linking to external services in flux. Or depend on software Incorporating services unavailable to others. Workflow fragility and hence decay. Workflows become plans and provenance rather than working scientific objects unless tended and updated.
  • In particular a platform for research into curation practices As in the panel today Expert – Is library like Suppliers and crowd are the web side Automated is
  • Group profile is the interrelationships between the services. Co-reference, Co-use,
  • Curation includes versioning Analytics includes monitoring
  • OAIS? From the model point of view. From the standoff annotation point of view. Metadata richness.
  • Skipped all but the core in talk. OAIS? From the model point of view. From the standoff annotation point of view. Metadata richness.
  • From the model point of view. From the standoff model neutral annotation point of view Bronze, silver, gold and platinum compliance levels.
  • Frankly, is it worth it to do the detailed stuff?
  • Richness spectrum Spoke to it but probably should have skipped The quality and completeness of metadata – graceful decay Platinum to bronz Semantic Web services IVOA talk asked – “why and when Semantics”. Here is an answer. Leads to multiple pipelines and multiple Scientist - Finding Simple classifications on a few properties. Simple queries, reduce search space, final decision with user Biological terms. Heavy use of provenance, reputation, usage patterns, operational properties, example configurations and boring stuff like that. Think Amazon. The interface is the thing. Automation – Validation and Execution Rich metadata for automatic service configuration, invocation and fault management Rich descriptions for reasoning: mismatches, debugging, repair Rich descriptions for reasoning: automated composition Hard and time-consuming
  • Joint project Manchester-EBI
  • Technical Infrastructure But its still not all joined up!! Feta keeps coming and going. Grid service descriptions are produced by annotating services with terms from the myGrid ontology, stored in a central registry, GRIMOIRES. Services are found using the Feta discovery service [5]. We have piloted expert manual annotation tools augmented by automated tools using information extraction techniques.
  • These ae not our scientists or our projects. We have none. Its just scientists in the wild. 50% usa and uk Google analytics says: 1931 uniq visitors for 3rd sept to 3rd oct 1698 uniq visitors for 3rd aug to 2nd sept myExperiment currently has 1203 users , 98 groups , 460 workflows , 130 files and 36 packs Extreme Web 2.0 18 months old Built on Ruby on Rails BSD License Source code hosted on RubyForge Publicly available 2 core developers 50% in Southampton, 50% in Manchester User driven design and development 959 active users 1429 unique IP visits in last month 82 groups 248 group memberships 296 workflow entries, 425 workflow versions 101 files 1382 taggings 46,427 downloads 77,393 viewings 408 creditations 12 packs (with 237 total entries)
  • Towards repeatable, reproducible, comparable and reusable research
  • Didn’t go into details
  • I have no picture of Dave Newman
  • Biocatalogue Talk Slides

    1. 1. BioCatalogue Joined project: Aim: Create a registry of annotated biological web services & Funded by:
    2. 2. Timeline and Approach <ul><li>Started 1 st June </li></ul><ul><li>6 months Pilot 1 </li></ul><ul><li>Perpetual beta </li></ul><ul><li>“ BioCatalogue-Friends” focus group </li></ul><ul><li>Extensible software </li></ul><ul><li>Built to be evolved and to be scaled. </li></ul>
    3. 3. In the Wild Cloud Data Services <ul><li>Major data centres </li></ul><ul><li>EMBL-EBI, UK, DDBJ, Japan , NCBI, USA , PDBJ, Japan </li></ul><ul><li>Smaller projects and databases </li></ul><ul><li>o Kanehisa Laboratory, Kyoto, Japan </li></ul><ul><li>o myGrid, Manchester, UK </li></ul><ul><li>o BASIS, University of Newcastle, UK </li></ul><ul><li>o Biomolecular Interaction Network Database, BIND, University of Toronto, Canada </li></ul><ul><li>o GeneCruiser, Broad Institute, Harvard-MIT, USA </li></ul><ul><li>o Genomics and Bioinformatics Group: Lab of Molecular Pharmacology, USA </li></ul><ul><li>o BioMoby </li></ul><ul><li>o Virginia Bioinformatics Institute, USA </li></ul><ul><li>o Center for Biological Sequence Analysis, CBS, Technical University of Denmark </li></ul><ul><li>o Helmholtz open bioinformatics technology, Germany </li></ul><ul><li>o Information Hyperlinked over Proteins, iHOP </li></ul><ul><li>o SIGENAE project, France </li></ul><ul><li>o The Nottingham Arabidopsis Stock Centre, NASC, UK </li></ul><ul><li>o Bioinformatics Competence Center Braunschweig, Germany </li></ul><ul><li>o Gene Ontology visualisation, Goviz </li></ul><ul><li>o Bioinformatics group, Italy </li></ul><ul><li>o The National Centre for Text Mining, NaCTeM </li></ul><ul><li>o Centro de Ciencias Genómicas, UNAM, Mexico </li></ul><ul><li>o e-Fungi, Manchester, UK </li></ul><ul><li>o FUGE bioinformatics platform, Norway </li></ul><ul><li>o Institute of Bioinformatics, Tsinghua University, China </li></ul><ul><li>o EMAP, Edinburgh Mouse Atlas Project, UK </li></ul><ul><li>o The Chemical Informatics and Cyberinfrastructure Collaboratory (CICC), Indiana University, US, ChemSpider </li></ul> Variable sustainable stewardship
    4. 4. Digital Curation is… <ul><li>… . about maintaining and adding value to a trusted body of digital assets for current and future use by, and on behalf of, a community . </li></ul><ul><li>… . a long term process where those assets are managed, cleaned up and corrected, associated with metadata, annotated and discussed, and appropriately preserved or reliably disposed of. </li></ul><ul><li>… . about enabling assets to be effectively found, understood, and reused in anticipated and unanticipated ways by those who created them, by those who did not, by their home community, and by alien communities. </li></ul>
    5. 5. Curate Processes <ul><li>A repository </li></ul><ul><li>A means to pool, discover and reuse workflows </li></ul><ul><li>A means to curate workflows </li></ul><ul><li>A platform for workflow monitoring and analytics </li></ul>A registry A means to pool metadata about services in the wild A means to discover and reuse those services A means to curate services A platform for service monitoring and analytics
    6. 6. Service and Workflow analytics and network analysis Recommendations and co-use. Social networks of third party externally hosted services Automated diagnostics, monitoring and metadata curation
    7. 7. Finding and Curating Services Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies. Drawing on experience at EBI in service provision. First pilot early November 2008, will cover major providers (EBI, NCBI, DDBJ) at “bronze” quality and show some at platinum.
    8. 8. Web Services in the Wild <ul><li>Findable? </li></ul><ul><li>The clustalw program from Emboss is called ‘emma’ </li></ul><ul><li>Executable? </li></ul><ul><li>WSDL / WADL / W*DL </li></ul><ul><li>Other kinds of services? </li></ul><ul><li>Understandable? </li></ul><ul><li>Input0:string, Output0: string </li></ul><ul><li>What does the polymorphic SeqRet actually do? </li></ul><ul><li>Example data? Parameter configurations? Input-Output correlations? </li></ul><ul><li>Poorly documented black boxes. </li></ul><ul><li>Usable? </li></ul><ul><li>Quality of Service, monitoring, robustness </li></ul><ul><li>Stability and dependability </li></ul><ul><li>Licensing </li></ul>
    9. 9. Writing Reusable stuff is DIFFICULT <ul><ul><li>Predicting the unknown </li></ul></ul><ul><ul><li>required by the unknown. </li></ul></ul><ul><li>Scientists and Developers are </li></ul><ul><li>under pressure and naughty. </li></ul>
    10. 10. Services Mutability and Preservation <ul><li>Services are in constant and often silent change. </li></ul><ul><li>Dynamic and Unstable. </li></ul><ul><li>Metadata decay </li></ul><ul><li>(esp. on services instances). </li></ul><ul><li>Workflow Decay. </li></ul><ul><li>Monitoring and Repair. </li></ul><ul><li>BioNanny. </li></ul><ul><li>Implications for preservation not fossilisation. </li></ul><ul><li>Implications for sustainability. </li></ul>
    11. 11. Workflows and Services Curation by Experts Social Curation by the Crowd refine validate refine validate Self-Curation by Contributors seed seed refine validate seed refine validate seed Automated Curation
    12. 12. Multiple Annotation Profiles User Profile Service Profile Profile Annotation Profile Annotation Profile Annotation Ranking Functions Group Profile
    13. 13. Service Profile Curation Model Quantitative Content Tags Service Model Semantic Content Model Ontologies Functional Provenance Operational Operational Metrics Conditions of Use Social Standing 6 facets Versioning QoS Usage
    14. 14. A.N. Other Execution at Host Service Profile Finding WSDL WADL S-A.N. Other SAWSDL SA-REST Analytics Ranking Browse/Shop Search Customised Services Workflows Monitoring Profiles Curation Quant’ve Service Model Semantic Content Model
    15. 15. Service Profile Facets Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance
    16. 16. Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ontologies Controlled Vocabularies Tags Free text Folksonomies Standards W*DL Atom Schemas
    17. 17. Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ranking
    18. 18. Pay as you Go, Emergent Curation Just enough, Just in Time, not Just in Case. What is the Return for the Investment? Gain Pain Very BAD Good, but Unlikely Just right Folksonomy Tagging Hard Core full on Ontology Curation Rich enough metadata for effective reuse
    19. 19. <ul><li>Scientist – Finding. </li></ul><ul><li>Simple metadata on a few properties. Smart tools. “Coarse grained”. </li></ul><ul><li>Decision Support. Simple Ontologies. Folksonomies. Indexing. Matching. </li></ul><ul><li>Automation – Composition, Validation and Execution. </li></ul><ul><li>Rich metadata for automatic service configuration, invocation, debugging, repair, automated composition </li></ul><ul><li>Decision making. Rich ontologies. Reasoning. </li></ul><ul><li>Scientist – (Re)Using . </li></ul><ul><li>Richer metadata explanation on the inputs, outputs and each operation. </li></ul>
    20. 20. myGrid History - Feta
    21. 21. <ul><li>3500+ service operations </li></ul><ul><li>700+ annotated by full-time curator. </li></ul><ul><li>Feta and Find-O-Matic discovery tools </li></ul>
    22. 23. BioCatalogue: The pilot <ul><li>Features: </li></ul><ul><ul><li>User Registration </li></ul></ul><ul><ul><li>Service Registration </li></ul></ul><ul><ul><li>Search </li></ul></ul><ul><ul><li>Annotation </li></ul></ul><ul><ul><li>Notification </li></ul></ul><ul><ul><li>Integration with myExperiment </li></ul></ul><ul><ul><li>Keep it simple </li></ul></ul>
    23. 24. Service Coverage + EMBRACE
    24. 28. Roadmap – Perpetual Beta <ul><li>Services </li></ul><ul><li>BioMoby and Embrace support </li></ul><ul><li>Support for REST services </li></ul><ul><li>Operational Metrics </li></ul><ul><li>Service monitoring </li></ul><ul><li>Notifications </li></ul><ul><li>“ Test a service” </li></ul><ul><li>Discovery </li></ul><ul><li>Enhancing search functionality </li></ul><ul><li>Semantic search </li></ul><ul><li>Facetted Browsing a la Amazon </li></ul><ul><li>Customised ranking </li></ul><ul><li>Curation </li></ul><ul><li>Semantic annotation </li></ul><ul><li>Usage metrics collection </li></ul><ul><li>Improved user interfaces </li></ul><ul><li>Third Party integration </li></ul><ul><li>REST APIs </li></ul><ul><li>Third party scavenging and monitoring – SeekDa!, BioMOBY </li></ul><ul><li>myExperiment integration </li></ul>
    25. 29. Importers Importers Ontology Editor Ontologist BioCatalogue Catalogue Manager Service Providers Service Provider Workbench Domain Services Bio Web Services Extraction Importers Curator Workbench Expert Curator Chameleon change handler Discovery Service EB-eye Search Scientists Ontology Exporter Curation and Acquisition Tools Discovery Services Backend Catalogue Services Ontology Services “ Shopping” Web Interface Find-O-Matic Auto Annotation Advanced Finding Web Service Interface BioNanny Monitor Reviewing Feedback Blogging Tags Service Providers Tool Developers Web Browser Tool Developers Tags Community analysis Service analysis Community Use Monitor Community Tools + Tags Scientists EB-eye Ranking Matching
    26. 30. Sister Project Close partnership Social Curation Shared Code
    27. 31. Finding, curating and reusing workflows Connecting Scientists in the Wild A supermarket for workflow users. A toolbox for workflow creators. Social networking over commodities. Different disciplines. 1200+ members from 114 countries. 50000+ workflows downloads. 1500-2000 unique visitors / month 460+ workflows. 98 groups. 35+ packs. Running for just over a year. Joint Manchester and Southampton. Project leader: Prof David De Roure
    28. 32. <ul><li>Workflows, simulations, scripts, experimental plans statistical models, ... </li></ul><ul><li>Bottom up e-Science repository for Scientific Research Objects </li></ul><ul><li>Sharing to propagate expertise and build reputation. </li></ul><ul><li>Collaboration. </li></ul><ul><li>Towards reusable and comparable research. </li></ul>,
    29. 33. Open and off the shelf….. <ul><li>… . Open to workflow systems (Taverna, Trident, BPEL…) </li></ul><ul><li>… . Open to voluntary added applications. </li></ul><ul><li>… . Web Services and scripts </li></ul><ul><li>… . Browser mashups </li></ul><ul><li>… . Applications and tools </li></ul><ul><li>… . User’s environments </li></ul>Google Gadget Web 2.0 protocols, Open Archive Initiative, Linked Open Data, RESTful APIs, Global, persistent URIs
    30. 34. More Information <ul><li>BioCatalogue website </li></ul><ul><li> </li></ul><ul><li>BioCatalogue wiki </li></ul><ul><li> </li></ul><ul><li>myGrid website </li></ul><ul><li> </li></ul>
    31. 35. BioCatalogue Team Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Rodrigo Lopez Eric Nzuobontane
    32. 36. my Grid+ Team
    33. 37. Curation Sweatshop <ul><li>Steady increase in numbers of services and workflows </li></ul><ul><li>Users able to find annotates services </li></ul><ul><li>BUT </li></ul><ul><li>Time-consuming and expensive. </li></ul><ul><li>More and more services built daily </li></ul><ul><li>SO </li></ul><ul><li>We should enable suppliers to add value </li></ul><ul><li>We should get users involved </li></ul>