Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines Justin Wilson, Manhong Dai Stanley Watson, Fa...
Make <ul><li>First released in 1977 by Stuart Feldman at Bell Labs </li></ul><ul><li>Originally designed for compiling pro...
The Name of the Game <ul><li>No data is an island unto itself </li></ul><ul><li>Typical bioinformatics pipeline </li></ul>...
Updating <ul><li>Re-running the pipeline </li></ul><ul><ul><li>Driving factor:  Demand for “current” information </li></ul...
Using Make <ul><li>Stage 1: Download (configure) </li></ul><ul><ul><li>Download (new data) ‏ </li></ul></ul><ul><ul><li>No...
Projects <ul><li>WGAS </li></ul><ul><ul><li>http://arrayanalysis.mbni.med.umich.edu </li></ul></ul><ul><ul><li>Harvests Af...
Projects <ul><li>CustomCDF </li></ul><ul><ul><li>Google customcdf </li></ul></ul><ul><ul><li>Aligns Affymetrix probes to r...
Acknowledgements <ul><li>The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is ...
Upcoming SlideShare
Loading in...5
×

Wilson Make Bosc2008

295

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
295
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Wilson Make Bosc2008

  1. 1. Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines Justin Wilson, Manhong Dai Stanley Watson, Fan Meng Psychiatry Department and Molecular and Behavioral Neuroscience Institute University of Michigan
  2. 2. Make <ul><li>First released in 1977 by Stuart Feldman at Bell Labs </li></ul><ul><li>Originally designed for compiling programs </li></ul><ul><li>General purpose automation tool </li></ul><ul><ul><li>Compilation </li></ul></ul><ul><ul><li>Analysis </li></ul></ul><ul><ul><li>Situations where one file depends on another </li></ul></ul>
  3. 3. The Name of the Game <ul><li>No data is an island unto itself </li></ul><ul><li>Typical bioinformatics pipeline </li></ul><ul><ul><li>Collect data from various sources </li></ul></ul><ul><ul><ul><li>Internet utilities </li></ul></ul></ul><ul><ul><li>Processing </li></ul></ul><ul><ul><ul><li>Scripts </li></ul></ul></ul><ul><ul><ul><li>Parsers </li></ul></ul></ul><ul><ul><ul><li>Programs </li></ul></ul></ul><ul><ul><ul><li>Database </li></ul></ul></ul><ul><ul><li>Packaged as web service + database </li></ul></ul>
  4. 4. Updating <ul><li>Re-running the pipeline </li></ul><ul><ul><li>Driving factor: Demand for “current” information </li></ul></ul><ul><ul><li>Limiting factor: Resources (ie. time) required </li></ul></ul>
  5. 5. Using Make <ul><li>Stage 1: Download (configure) </li></ul><ul><ul><li>Download (new data) ‏ </li></ul></ul><ul><ul><li>Not always trivial (new URLs) ‏ </li></ul></ul><ul><li>Stage 2: File processing </li></ul><ul><ul><li>Downloaded files -> processed files -> ... </li></ul></ul><ul><ul><li>Only new files are processed </li></ul></ul><ul><li>Stage 2.1: Database </li></ul><ul><ul><li>Model tables, indexes, views as files </li></ul></ul><ul><ul><li>File processing becomes SQL statement + touch </li></ul></ul>
  6. 6. Projects <ul><li>WGAS </li></ul><ul><ul><li>http://arrayanalysis.mbni.med.umich.edu </li></ul></ul><ul><ul><li>Harvests Affymetrix CEL files from GEO and ArrayExpress every night </li></ul></ul><ul><li>Local copy of dbSNP </li></ul><ul><ul><li>Initial load took hours </li></ul></ul><ul><ul><li>Minor update took minutes </li></ul></ul>
  7. 7. Projects <ul><li>CustomCDF </li></ul><ul><ul><li>Google customcdf </li></ul></ul><ul><ul><li>Aligns Affymetrix probes to reference sequences for various organisms </li></ul></ul><ul><ul><li>Large # of data sources </li></ul></ul><ul><ul><li>Made future modifications easier </li></ul></ul><ul><ul><li>Makefile submits jobs to cluster </li></ul></ul>
  8. 8. Acknowledgements <ul><li>The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is supported by the Pritzker Neuropsychiatric Disorders Research Fund L.L.C. This work is also supported in part by the National Center for Integrated Biomedical Informatics through NIH grant 1U54DA021519-01A1 to the University of Michigan. </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×