Your SlideShare is downloading. ×
  • Like
Wilson Make Bosc2008
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Wilson Make Bosc2008



Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines Justin Wilson, Manhong Dai Stanley Watson, Fan Meng Psychiatry Department and Molecular and Behavioral Neuroscience Institute University of Michigan
  • 2. Make
    • First released in 1977 by Stuart Feldman at Bell Labs
    • Originally designed for compiling programs
    • General purpose automation tool
      • Compilation
      • Analysis
      • Situations where one file depends on another
  • 3. The Name of the Game
    • No data is an island unto itself
    • Typical bioinformatics pipeline
      • Collect data from various sources
        • Internet utilities
      • Processing
        • Scripts
        • Parsers
        • Programs
        • Database
      • Packaged as web service + database
  • 4. Updating
    • Re-running the pipeline
      • Driving factor: Demand for “current” information
      • Limiting factor: Resources (ie. time) required
  • 5. Using Make
    • Stage 1: Download (configure)
      • Download (new data) ‏
      • Not always trivial (new URLs) ‏
    • Stage 2: File processing
      • Downloaded files -> processed files -> ...
      • Only new files are processed
    • Stage 2.1: Database
      • Model tables, indexes, views as files
      • File processing becomes SQL statement + touch
  • 6. Projects
    • WGAS
      • Harvests Affymetrix CEL files from GEO and ArrayExpress every night
    • Local copy of dbSNP
      • Initial load took hours
      • Minor update took minutes
  • 7. Projects
    • CustomCDF
      • Google customcdf
      • Aligns Affymetrix probes to reference sequences for various organisms
      • Large # of data sources
      • Made future modifications easier
      • Makefile submits jobs to cluster
  • 8. Acknowledgements
    • The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is supported by the Pritzker Neuropsychiatric Disorders Research Fund L.L.C. This work is also supported in part by the National Center for Integrated Biomedical Informatics through NIH grant 1U54DA021519-01A1 to the University of Michigan.