Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BioMake PAG 2017


Published on

Presentation on BioMake, a GNU-Make-like utility for managing builds and complex workflows using declarative specifications. From GMOD/PAG meeting 2017

Published in: Software
  • Be the first to comment

  • Be the first to like this

BioMake PAG 2017

  1. 1. BioMake Ian Holmes & Christopher Mungall UC Berkeley, Berkeley Lab
  2. 2. Bioinformatics analysis pipelines involves chains of dependencies
  3. 3. GNU make CC = gcc %.o: %.c $(CC) -c -o $@ $< Rules, patterns, variables, automatic variables Compiling a C program Makefile
  4. 4. GNU make BWAMEM = bwa REF = ref.fasta %.sam: %.fastq $(BWA) mem $(REF) $< >$@ Aligning reads Rules, patterns, variables, automatic variables Makefile
  5. 5. Dependencies BWAMEM = bwa REF = ref.fasta %.sam: %.fastq $(REF).bwt $(BWA) mem $(REF) $< >$@ %.bwt: % $(BWA) index $< Timestamps, dependency graphs
  6. 6. Functions MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST))) MAKEFILE_DIR := $(dir $(MAKEFILE_PATH)) test1: echo $(shell ls $(MAKEFILE_DIR)) FUNC = echo $1 is $2 test2: $(call FUNC,make,cool) Special variables, functions, user-defined functions
  7. 7. Issues with GNU make • Only one wildcard per rule • Poor support for parallelism • Timestamps are fragile, especially on NFS • Can’t extend build logic, e.g. add qualifiers to rules Biomake fixes these!
  8. 8. Multiple wildcards per rule $X-$Y.sam: $X.fa.bwt $Y.fastq bwa mem $X.fa $Y.fastq >$@ biomake myref-myreads.sam
  9. 9. MD5 signatures $X-$Y.sam: $X.fa.bwt $Y.fastq bwa mem $X.fa $Y.fastq >$@ biomake --md5-hash myref-myreads.sam md5_hash("myref-myreads.sam",12,"0f723ae7f9bf07744445e93ac5595156"). md5_valid("myref-myreads.sam",12,"0f723ae7f9bf07744445e93ac5595156",X) :- md5_check("myref.fa.bwt",6,"b1946ac92492d2347c6235b4d2611184",X), md5_check("myreads.fastq",6,"591785b794601e212b260e25925636fd",X). .biomake/md5/myref-myreads.sam
  10. 10. Multiple queue engines $X-$Y.sam: $X.fa.bwt $Y.fastq bwa mem $X.fa $Y.fastq >$@ biomake -Q sge myref-myreads.sam biomake -Q slurm myref-myreads.sam biomake -Q pbs myref-myreads.sam biomake -Q poolq myref-myreads.sam
  11. 11. Logic extensions $X-$Y.sam: $X.fa.bwt $Y { size_file(Y,S), S < 1000000000 } bwa mem $X.fa $Y >$@ Can embed Prolog in Makefiles, or auto-translate entire Makefile to Prolog
  12. 12. (Some of) The Competition Language GNU- compatible? MD5? Clusters? Erlang make Erlang No No No omake OCaML Somewhat Yes No makepp Perl Yes Yes No qmake C Yes No SGE
  13. 13. What Biomake is not(yet) • A massively parallel data analysis framework (try Apache Spark) • A heavy-duty cloud-oriented workflow language (try Common Workflow Language) • A web application for managing jobs (try Galaxy)
  14. 14. What Biomake is A simple, drop-in replacement for GNU make that allows you to… • ramp up your Makefile-driven workflows to cluster scale • avoid costly rebuilds triggered by file copying & unsynchronized clocks • extend rules with wildcards and logical tests
  15. 15. • Ian Holmes • Contributors & testers • Mahesh Panchal • Markus Triska • Jan Wielemaker - creator of SWI-Prolog