Iga workflow


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Iga workflow

  1. 1. IGA WorkflowAlessandro Gervaso Vittorio Zamboni
  2. 2. IntroductionIGA Workflow is a web-based tool createdusing the Django Framework.It is intended as a management tool for the IGAwet-lab.We use it to track the lifecycle of biologicalsamples, from vial to file.
  3. 3. Overview● Laboratory management, from sample to flowcell● Bioinformatic analyses● Pipelines management● Technology● Other applications and future developments
  4. 4. Biology for dummies
  5. 5. Technology - BasicsDatabase server● Postgresql● RedisWorkers● CeleryWeb server● nginx + uWSGI
  6. 6. Overview - Lab● SAMPLE: The basic unit in the lab● LIBRARY: a treated sample with an attached chemical TAG.● POOL: a set of libraries, ready to be placed on a flowcells lane.
  7. 7. Overview - Lab
  8. 8. Overview - LabThe main challenge was to replace notebookswith a tool that allows to:● insert samples, libraries, pools, and edit them;● create lanes and runs and the configuration files for the physical sequencer;● collect the sequencer results and map them in a easy wayAlmost done!
  9. 9. Overview - LabWe started using the basic Django admin BUT● the page loading was slow● due to the admins nature we lacked flexibility● we were forcing the lab people procedures● management was cumbersome
  10. 10. Overview - AnalysesAfter the physical sequencing the raw data(basecalls) must be converted in FASTQ files.The FASTQ files are FASTA files with someembedded quality stats.They are the starting point for almost everygenomic analysis.
  11. 11. Overview - AnalysesTo optimize time and resources we use acluster of Celery workers.● we track the software packages used● we track their parameters● we create a set of useful stats
  12. 12. TechnologyAdditions:● Informational celery tasks (list directories content, copy files between devices with dbus plugin and UDisks instead of ugly hacks, ...)
  13. 13. Overview - PipelinesFASTQ files alignments and assembliesEach analysis use different software insequence or in parallel.Using hundreds of samples, the analyses cantbe handled manually.
  14. 14. PipelinesThe results of each pipelines (like previousanalyses) must be tracked.Since CLI based software is not user-friendly,we develop a graphical pipeline builder.Users are able to choose and combine differentsoftwares to perform their own analyses.
  15. 15. Pipeline● Workers have different queues in order to satisfy different tasks● Workers tasks talk each other with Redis to avoid inconsistencies and to improve performances
  16. 16. PipelinesDifferent queues (stored in Redis dbs): ○ available ○ queued ○ active ○ completed ○ error
  17. 17. PipelinesVideo demonstration of a working pipeline withuse of different kind of steps.(1 video: 1 minute)
  18. 18. Under development● simple interface that allow customers to: ○ insert their samples directly ○ watch the results of their pipeline in a genome browser - also made with Django (see below)!● barcodes● genome browser (like GMOD GBrowse, but with the greatness of Python instead of the confusion of Perl)
  19. 19. Genome browserAn application that allows browsing a genomesannotations (like genes, or where reads arealigned).Actually, the best web genome browser isGMOD Gbrowse.
  20. 20. Genome browserThe challenge is to develop a genome browserthat has a set of basic features and couldaccept plugins for particular type of data - likeGMOD Gbrowse.In addition, it must be quick and easy tomanage - NOT like GMOD Gbrowse.
  21. 21. Genome browserVideo demonstration of the genome browser.(2 videos: 1 minute + 1 minute)
  22. 22. Acknowledgments● The wet-lab Teams that developed: ladies at IGA ● Django ● JQuery ● nginx● WEBdeBS ● uWSGI ● Celery ● Redis ● pip and virtualenv ● PostgreSQL ● All the open source projects involved