Biocatalogue, FileQuirks, MyExperiment


Published on

Presentation from IIMCB Seminar. Summary from my Fellowship in MyGrid, Manchester

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biocatalogue, FileQuirks, MyExperiment

  1. 1. Summary from my fellowship in Manchester e-Science in Manchester Jerzy Orłowski Jerzy Orłowski
  2. 2. What will I talk About <ul><li>Part one: Biocatalogue SearchByData </li><ul><li>Searching for services that will analyze or process your data file
  3. 3. Other ideas, born meanwhile </li></ul><li>Part two: Other things I've done meanwhile
  4. 4. Part three: How do they do it in MyGrid </li><ul><li>New methodology that we could adopt
  5. 5. How can I help you, how you can help me </li></ul></ul>
  6. 6. <ul>Part one: Biocatalogue SearchByData </ul>
  7. 7. Biocatalogue <ul><li>The BioCatalogue is a catalogue of Life Science Web Services
  8. 8. A web service is a network application with programmatic interface
  9. 9. BioCatalogue relies on community annotation </li><ul><li>Service providers
  10. 10. Users
  11. 11. Curator </li></ul><li>Technology: Ruby on Rails </li></ul>
  12. 12. Browsing services
  13. 14. My contribution <ul><li>Search ByData </li><ul><li>Ability to find services not on tags, providers etc. but on exemple input files </li></ul><li>Algorithm based on FileQuirks based in GeneSilico
  14. 15. User provides an real input file, which is matched with example inputs of all the services using regular expressions
  15. 16. Services most likely to analyze / process user file are returned </li></ul>
  16. 18. Other ideas – getting example input files <ul><li>The main limitation of Search By Data is lack of example inputs for services </li><ul><li>for 1169 services, more than 3000 operations there are no more than 500 example inputs
  17. 19. Most of inputs are numbers or ids </li></ul><li>Idea – get more inputs: </li><ul><li>From people executing the services </li><ul><li>Taverna Provenance
  18. 20. Soap Servlet </li></ul><li>By executing services by bots with some data </li></ul></ul>
  19. 21. Soap Servlet <ul><li>Automatic generation of web interface for web services:
  20. 22. For users: allows to quickly test or execute a service
  21. 23. For us: allows to get example inputs for services
  22. 24. Currently – alpha version </li></ul>
  23. 25. Soap Servlet interface for Afold
  24. 26. Soap Servlet interface for Afold
  25. 27. Part 2: Other projects I've done meanwhile
  26. 28. <ul>GeneSilico web services </ul><ul><ul><li>Turning some of our programs into SOAP services
  27. 29. ProteinSilico
  28. 30. ModeRNA
  29. 31. Parts of MetaServer
  30. 32. Parts of MetaRNA
  31. 33. See:
  32. 34. Good documentation on BioCatalogue, used to test Search By Data </li></ul></ul>
  33. 35. <ul>FileQuirks </ul><ul><li>FileQuirks – web server for recognition of biological data types </li><ul><li>New user interface
  34. 36. More data types
  35. 37. Help pages
  36. 38. Summary sent to NAR (waiting for decision) </li></ul><li> </li></ul>
  37. 41. FileQuirks Help Pages <ul><li>I decided to use Joomla CMS </li><ul><li>Help pages have standard format
  38. 42. Joomla make them easy to write and update
  39. 43. GeneSilico home page is written in Joomla so it would be easy to migrate/merge and graphic template already exists </li></ul><li>It easy to add help pages of other services
  40. 44. Software and server list on is outdated
  41. 45. Maybe we should clean up? </li></ul>
  42. 46. Genesilico web services <ul><li>Web service is a network tool with programatic API “program as a service”
  43. 47. Pros </li><ul><li>Compatibility between languages (XML is the protocol)
  44. 48. Code reusage – no need to install programs
  45. 49. Easy linking with other tools
  46. 50. Automatic user interface generation </li></ul><li>Cons: </li><ul><li>You have to maintain the server
  47. 51. Harder to make it private
  48. 52. Less suitable for systems that take a long time to execute </li></ul></ul>
  49. 53. Example 1 <ul><li>MetaMQAP </li><ul><li>Kudlaty Chimera MetaMQAP plugin uses MetaMQAP (wrote his own interface)
  50. 54. Toolkit uses MetaMQAP
  51. 55. I have also written scripts for using MetaMQAP
  52. 56. Conclusions: </li><ul><li>MetaMQAP needs to be installed and maintained on many different systems by different people
  53. 57. Making a SOAP server will save people time </li></ul></ul></ul>
  54. 58. Example 2 <ul><li>Methods for RNA secondary structure prediction </li><ul><li>They are used by RNA MetaServer
  55. 59. Tomek Puton uses them for CompaRNA
  56. 60. They were used by me for testing Search By Data
  57. 61. Conclusions: </li><ul><li>SOAP interface for fast methods exists
  58. 62. It just need updating and incorporating in other tools </li></ul></ul></ul>
  59. 63. GeneSilico web services Instructions on:
  60. 64. How do they do it in MyGrid? Some methodology we might adopt or just be aware of
  61. 65. Working system <ul><li>They do dot make science itself – they make tools for scientists </li><ul><li>And science about how new technologies are adopted in science </li></ul><li>Every project is collaboration with other groups
  62. 66. There is always more than one people working on a project </li><ul><li>more than 25% of time spent on meetings </li></ul><li>Code developers are not scientists, but employees </li><ul><li>They do not wrote papers nor grants </li></ul></ul>
  63. 67. Working system <ul><li>2 “uncommon” positions </li><ul><li>Project manager </li><ul><li>Not a scientist
  64. 68. Not a developer
  65. 69. Responsible for: </li><ul><li>keeping up with release schedule
  66. 70. grant schedule
  67. 71. cooperation between projects </li></ul></ul><li>Service curator </li><ul><li>Not a developer
  68. 72. Responsible for </li><ul><li>keeping in touch with user community
  69. 73. Organizing meetings with focus group, jamborees etc.
  70. 74. Service documentation
  71. 75. Service visibility: Wikipedia, Google, links ... </li></ul></ul></ul></ul>
  72. 76. Working system <ul><li>No seminars
  73. 77. Instead weekly meeting with advances on all projects
  74. 78. A lot of project dedicated meetings and teleconferences </li></ul>
  75. 79. Sharing policy <ul><li>Code and ideas are even from the beginning of the project </li><ul><li>Scientific finding can be published only once but tools can be better and better
  76. 80. Selling your ideas enables cooperations and making tools compatible – better grants
  77. 81. Publishing your code (git, svn) get you more users – nice for publications and grants </li></ul></ul>
  78. 82. Development <ul><li>Languages: Java and Ruby on Rails
  79. 83. Every code is under version control </li><ul><li>Massive branching and merging </li></ul><li>Dependency management systems (maven)
  80. 84. All services are hosted </li><ul><li>Collaborations (EMBL-EBI)
  81. 85. Corporate hosting
  82. 86. Clouds (Amazon EC2) </li></ul><li>Making user community </li></ul>
  83. 87. Summary – what we could discuss <ul><li>Programatic interfaces (Middleware) </li><ul><li>I can make SOAP interfaces for you, deploy and publish them
  84. 88. I would require you to use such interfaces in your future code </li></ul><li>What else I can give: </li><ul><li>CMS for public help pages for programs and web servers </li></ul><li>What I'd like to ask </li><ul><li>To test FileQuirks, GeneSilico web services and Search by data </li></ul><li>Sharing policy for software development projects
  85. 89. SVN and Web pages cleanup
  86. 90. Visibility on the web (proper linking, wikipedia, various lists etc.) </li></ul>
  87. 91. Acknowledgments <ul><li>MyGrid </li><ul><li>Carole Goble
  88. 92. Charlotte Hooson-Sykes
  89. 93. Jithen Bhagat, Franck Tanoh, Soahib Sufi, Peter Li and others </li></ul><li>University of Southampton </li><ul><li>David da Roure
  90. 94. David Neuman and others </li></ul><li>Leiden University </li><ul><li>Marco Roos </li></ul></ul><ul><li>GeneSilico </li><ul><li>Janusz Bujnicki
  91. 95. Iga Korneta
  92. 96. Piotr Iwaniuk, Jakub Jopek, Bartosz Bedyński, Artur Skarżycki </li></ul></ul>