Galaxy

1,476 views

Published on

Title: Galaxy
Author: James Taylor

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,476
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
78
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Galaxy

  1. 1. Galaxy (http://g2.bx.psu.edu)
  2. 2. What is Galaxy? • An open-source framework for integrating various computational tools and databases into a cohesive workspace • A web-based service we (Penn State) provide, integrating many popular tools and resources for comparative genomics • A completely self-contained Python application for building your own Galaxy style sites
  3. 3. Galaxy’s web user interface
  4. 4. Integrating tools into Galaxy
  5. 5. How Galaxy integrates existing web-based tools
  6. 6. Proxy based tools User makes request to Galaxy
  7. 7. Proxy based tools Galaxy delegates request to external site
  8. 8. Proxy based tools External site generates response • If data, Galaxy determines type, processes, and adds to ‘history’ • Otherwise, return response to user
  9. 9. External tools User makes request to Galaxy
  10. 10. External tools Galaxy sends user directly to external site with extra URL data
  11. 11. External tools User interacts directly with external site
  12. 12. External tools When data is generated the user is sent back to Galaxy. Data can be fetched immediately, or wait for notification from the external site
  13. 13. How Galaxy integrates existing command line tools
  14. 14. HTML inputs generated from abstract parameter description
  15. 15. HTML inputs generated from abstract parameter description
  16. 16. HTML inputs generated from abstract parameter description
  17. 17. HTML inputs generated from abstract parameter description
  18. 18. Tool help generated from a simple text format
  19. 19. Automatic input validation based on type, or more...
  20. 20. Template for generating } command line from parameter values
  21. 21. Output datasets } generated by the tool
  22. 22. Special actions to be run } before / after execution
  23. 23. Functional tests to be run with the “full stack” in place
  24. 24. Running functional tests for a speci c tool on the command line
  25. 25. Test results, on command line and as HTML report
  26. 26. Dealing with more complex interface needs
  27. 27. Repeating sets of parameters
  28. 28. Template language for building complex command lines
  29. 29. Conditional groups, grouping constructs can be nested
  30. 30. Command line tool expects a con guration le
  31. 31. Con guration le is generated based on user input
  32. 32. Job execution in Galaxy
  33. 33. Flexible execution environment • Dependencies between jobs handled by “JobManager” within Galaxy. • Either in-process with the web application, or a separate process managing a queue to which multiple front-ends submit
  34. 34. Flexible execution environment • Once jobs are ready, submitted to a “JobRunner” • Runners are pluggable • Can have multiple runners, and jobs to di erent runners depending on capabilities • Current implementations: • Local runner executing a limited number of local processes • PBS runner dispatches to a cluster of worker nodes • Pluggable queueing policies in the works!
  35. 35. Deeper customization of Galaxy
  36. 36. Galaxy web interface is easily customized / branded
  37. 37. Custom datatypes • Datatypes supported by a Galaxy instance can be con gured at runtime • Completely reengineering “metadata” • Easy way to de ne custom metadata • Automatically generated editing interfaces (similar to tool interfaces) • Actions on datatypes (displaying at external sites, format conversion) all pluggable • Nothing “genomics” speci c will be hardcoded!
  38. 38. The future
  39. 39. Future tool development • Tools for statistical genetics • Collaborating closely with the “RGenetics” project (http://rgenetics.org) • Tools for phylogenetic analysis • Based on HyPhy (http://hyphy.org)
  40. 40. Work ow support • Work ow construction by example • Users will continue to build analysis as they do now, and will be able to extraction portions of their histories as reusable work ows • Will probably work for most existing histories! (we’ve been saving the right data all along) • Explicit work ow construction and editing • Support for repetitive invocation of tools and work ows, and aggregation of results • Saving and sharing of work ows, reproducible!
  41. 41. Some Technical Details
  42. 42. Under the hood • Python 2.4, though some dependencies use CPython speci c extensions • Web framework: PythonPaste, Routes, WebHelpers, Beaker, CheetahTemplate, ... • SQLAlchemy for database abstraction
  43. 43. Out of the box con guration • Just checkout from subversion and run! • All dependencies packaged as eggs • Pure python HTTP server included (paste.httpserver) • Embedded database (sqlite) • Datasets stored on local lesystem • Jobs run locally
  44. 44. PSU production con guration • Deployed behind Apache using mod_proxy • Python threads do not scale across CPUs, we use both forking and threading similar to Apache’s worker MPM • PostgreSQL • Jobs dispatched to a PBS cluster using “pbs- python”
  45. 45. The core Galaxy development team
  46. 46. Acknowledgements • Galaxy collaborators: • Ross Lazarus, Sergei Kosakovsky Pond • UCSC Genome Browser team • Biomart team • National Science Foundation

×