Your SlideShare is downloading. ×
0
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Galaxy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Galaxy

1,253

Published on

Title: Galaxy …

Title: Galaxy
Author: James Taylor

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,253
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
78
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Galaxy (http://g2.bx.psu.edu)
  • 2. What is Galaxy? • An open-source framework for integrating various computational tools and databases into a cohesive workspace • A web-based service we (Penn State) provide, integrating many popular tools and resources for comparative genomics • A completely self-contained Python application for building your own Galaxy style sites
  • 3. Galaxy’s web user interface
  • 4. Integrating tools into Galaxy
  • 5. How Galaxy integrates existing web-based tools
  • 6. Proxy based tools User makes request to Galaxy
  • 7. Proxy based tools Galaxy delegates request to external site
  • 8. Proxy based tools External site generates response • If data, Galaxy determines type, processes, and adds to ‘history’ • Otherwise, return response to user
  • 9. External tools User makes request to Galaxy
  • 10. External tools Galaxy sends user directly to external site with extra URL data
  • 11. External tools User interacts directly with external site
  • 12. External tools When data is generated the user is sent back to Galaxy. Data can be fetched immediately, or wait for notification from the external site
  • 13. How Galaxy integrates existing command line tools
  • 14. HTML inputs generated from abstract parameter description
  • 15. HTML inputs generated from abstract parameter description
  • 16. HTML inputs generated from abstract parameter description
  • 17. HTML inputs generated from abstract parameter description
  • 18. Tool help generated from a simple text format
  • 19. Automatic input validation based on type, or more...
  • 20. Template for generating } command line from parameter values
  • 21. Output datasets } generated by the tool
  • 22. Special actions to be run } before / after execution
  • 23. Functional tests to be run with the “full stack” in place
  • 24. Running functional tests for a speci c tool on the command line
  • 25. Test results, on command line and as HTML report
  • 26. Dealing with more complex interface needs
  • 27. Repeating sets of parameters
  • 28. Template language for building complex command lines
  • 29. Conditional groups, grouping constructs can be nested
  • 30. Command line tool expects a con guration le
  • 31. Con guration le is generated based on user input
  • 32. Job execution in Galaxy
  • 33. Flexible execution environment • Dependencies between jobs handled by “JobManager” within Galaxy. • Either in-process with the web application, or a separate process managing a queue to which multiple front-ends submit
  • 34. Flexible execution environment • Once jobs are ready, submitted to a “JobRunner” • Runners are pluggable • Can have multiple runners, and jobs to di erent runners depending on capabilities • Current implementations: • Local runner executing a limited number of local processes • PBS runner dispatches to a cluster of worker nodes • Pluggable queueing policies in the works!
  • 35. Deeper customization of Galaxy
  • 36. Galaxy web interface is easily customized / branded
  • 37. Custom datatypes • Datatypes supported by a Galaxy instance can be con gured at runtime • Completely reengineering “metadata” • Easy way to de ne custom metadata • Automatically generated editing interfaces (similar to tool interfaces) • Actions on datatypes (displaying at external sites, format conversion) all pluggable • Nothing “genomics” speci c will be hardcoded!
  • 38. The future
  • 39. Future tool development • Tools for statistical genetics • Collaborating closely with the “RGenetics” project (http://rgenetics.org) • Tools for phylogenetic analysis • Based on HyPhy (http://hyphy.org)
  • 40. Work ow support • Work ow construction by example • Users will continue to build analysis as they do now, and will be able to extraction portions of their histories as reusable work ows • Will probably work for most existing histories! (we’ve been saving the right data all along) • Explicit work ow construction and editing • Support for repetitive invocation of tools and work ows, and aggregation of results • Saving and sharing of work ows, reproducible!
  • 41. Some Technical Details
  • 42. Under the hood • Python 2.4, though some dependencies use CPython speci c extensions • Web framework: PythonPaste, Routes, WebHelpers, Beaker, CheetahTemplate, ... • SQLAlchemy for database abstraction
  • 43. Out of the box con guration • Just checkout from subversion and run! • All dependencies packaged as eggs • Pure python HTTP server included (paste.httpserver) • Embedded database (sqlite) • Datasets stored on local lesystem • Jobs run locally
  • 44. PSU production con guration • Deployed behind Apache using mod_proxy • Python threads do not scale across CPUs, we use both forking and threading similar to Apache’s worker MPM • PostgreSQL • Jobs dispatched to a PBS cluster using “pbs- python”
  • 45. The core Galaxy development team
  • 46. Acknowledgements • Galaxy collaborators: • Ross Lazarus, Sergei Kosakovsky Pond • UCSC Genome Browser team • Biomart team • National Science Foundation

×