Your SlideShare is downloading. ×
Wilson Make Bosc2008
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Wilson Make Bosc2008

269
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
269
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines Justin Wilson, Manhong Dai Stanley Watson, Fan Meng Psychiatry Department and Molecular and Behavioral Neuroscience Institute University of Michigan
  • 2. Make
    • First released in 1977 by Stuart Feldman at Bell Labs
    • Originally designed for compiling programs
    • General purpose automation tool
      • Compilation
      • Analysis
      • Situations where one file depends on another
  • 3. The Name of the Game
    • No data is an island unto itself
    • Typical bioinformatics pipeline
      • Collect data from various sources
        • Internet utilities
      • Processing
        • Scripts
        • Parsers
        • Programs
        • Database
      • Packaged as web service + database
  • 4. Updating
    • Re-running the pipeline
      • Driving factor: Demand for “current” information
      • Limiting factor: Resources (ie. time) required
  • 5. Using Make
    • Stage 1: Download (configure)
      • Download (new data) ‏
      • Not always trivial (new URLs) ‏
    • Stage 2: File processing
      • Downloaded files -> processed files -> ...
      • Only new files are processed
    • Stage 2.1: Database
      • Model tables, indexes, views as files
      • File processing becomes SQL statement + touch
  • 6. Projects
    • WGAS
      • http://arrayanalysis.mbni.med.umich.edu
      • Harvests Affymetrix CEL files from GEO and ArrayExpress every night
    • Local copy of dbSNP
      • Initial load took hours
      • Minor update took minutes
  • 7. Projects
    • CustomCDF
      • Google customcdf
      • Aligns Affymetrix probes to reference sequences for various organisms
      • Large # of data sources
      • Made future modifications easier
      • Makefile submits jobs to cluster
  • 8. Acknowledgements
    • The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is supported by the Pritzker Neuropsychiatric Disorders Research Fund L.L.C. This work is also supported in part by the National Center for Integrated Biomedical Informatics through NIH grant 1U54DA021519-01A1 to the University of Michigan.