BioMake PAG 2017


Presentation on BioMake, a GNU-Make-like utility for managing builds and complex workflows using declarative specifications. From GMOD/PAG meeting 2017

  1. 1. BioMake Ian Holmes & Christopher Mungall UC Berkeley, Berkeley Lab
  2. 2. Bioinformatics analysis pipelines involves chains of dependencies
  3. 3. GNU make CC = gcc %.o: %.c $(CC) -c -o $@ $< Rules, patterns, variables, automatic variables Compiling a C program Makefile
  4. 4. GNU make BWAMEM = bwa REF = ref.fasta %.sam: %.fastq $(BWA) mem $(REF) $< >$@ Aligning reads Rules, patterns, variables, automatic variables Makefile
  5. 5. Dependencies BWAMEM = bwa REF = ref.fasta %.sam: %.fastq $(REF).bwt $(BWA) mem $(REF) $< >$@ %.bwt: % $(BWA) index $< Timestamps, dependency graphs
  6. 6. Functions MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST))) MAKEFILE_DIR := $(dir $(MAKEFILE_PATH)) test1: echo $(shell ls $(MAKEFILE_DIR)) FUNC = echo $1 is $2 test2: $(call FUNC,make,cool) Special variables, functions, user-defined functions
  7. 7. Issues with GNU make • Only one wildcard per rule • Poor support for parallelism • Timestamps are fragile, especially on NFS • Can’t extend build logic, e.g. add qualifiers to rules Biomake fixes these!
  8. 8. Multiple wildcards per rule $X-$Y.sam: $X.fa.bwt $Y.fastq bwa mem $X.fa $Y.fastq >$@ biomake myref-myreads.sam
  9. 9. MD5 signatures $X-$Y.sam: $X.fa.bwt $Y.fastq bwa mem $X.fa $Y.fastq >$@ biomake --md5-hash myref-myreads.sam md5_hash("myref-myreads.sam",12,"0f723ae7f9bf07744445e93ac5595156"). md5_valid("myref-myreads.sam",12,"0f723ae7f9bf07744445e93ac5595156",X) :- md5_check("myref.fa.bwt",6,"b1946ac92492d2347c6235b4d2611184",X), md5_check("myreads.fastq",6,"591785b794601e212b260e25925636fd",X). .biomake/md5/myref-myreads.sam
  10. 10. Multiple queue engines $X-$Y.sam: $X.fa.bwt $Y.fastq bwa mem $X.fa $Y.fastq >$@ biomake -Q sge myref-myreads.sam biomake -Q slurm myref-myreads.sam biomake -Q pbs myref-myreads.sam biomake -Q poolq myref-myreads.sam
  11. 11. Logic extensions $X-$Y.sam: $X.fa.bwt $Y { size_file(Y,S), S < 1000000000 } bwa mem $X.fa $Y >$@ Can embed Prolog in Makefiles, or auto-translate entire Makefile to Prolog
  12. 12. (Some of) The Competition Language GNU- compatible? MD5? Clusters? Erlang make Erlang No No No omake OCaML Somewhat Yes No makepp Perl Yes Yes No qmake C Yes No SGE
  13. 13. What Biomake is not(yet) • A massively parallel data analysis framework (try Apache Spark) • A heavy-duty cloud-oriented workflow language (try Common Workflow Language) • A web application for managing jobs (try Galaxy)
  14. 14. What Biomake is A simple, drop-in replacement for GNU make that allows you to… • ramp up your Makefile-driven workflows to cluster scale • avoid costly rebuilds triggered by file copying & unsynchronized clocks • extend rules with wildcards and logical tests
  15. 15. • Ian Holmes • Contributors & testers • Mahesh Panchal • Markus Triska • Jan Wielemaker - creator of SWI-Prolog