Wilson Make Bosc2008

  • 261 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
261
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines Justin Wilson, Manhong Dai Stanley Watson, Fan Meng Psychiatry Department and Molecular and Behavioral Neuroscience Institute University of Michigan
  • 2. Make
    • First released in 1977 by Stuart Feldman at Bell Labs
    • Originally designed for compiling programs
    • General purpose automation tool
      • Compilation
      • Analysis
      • Situations where one file depends on another
  • 3. The Name of the Game
    • No data is an island unto itself
    • Typical bioinformatics pipeline
      • Collect data from various sources
        • Internet utilities
      • Processing
        • Scripts
        • Parsers
        • Programs
        • Database
      • Packaged as web service + database
  • 4. Updating
    • Re-running the pipeline
      • Driving factor: Demand for “current” information
      • Limiting factor: Resources (ie. time) required
  • 5. Using Make
    • Stage 1: Download (configure)
      • Download (new data) ‏
      • Not always trivial (new URLs) ‏
    • Stage 2: File processing
      • Downloaded files -> processed files -> ...
      • Only new files are processed
    • Stage 2.1: Database
      • Model tables, indexes, views as files
      • File processing becomes SQL statement + touch
  • 6. Projects
    • WGAS
      • http://arrayanalysis.mbni.med.umich.edu
      • Harvests Affymetrix CEL files from GEO and ArrayExpress every night
    • Local copy of dbSNP
      • Initial load took hours
      • Minor update took minutes
  • 7. Projects
    • CustomCDF
      • Google customcdf
      • Aligns Affymetrix probes to reference sequences for various organisms
      • Large # of data sources
      • Made future modifications easier
      • Makefile submits jobs to cluster
  • 8. Acknowledgements
    • The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is supported by the Pritzker Neuropsychiatric Disorders Research Fund L.L.C. This work is also supported in part by the National Center for Integrated Biomedical Informatics through NIH grant 1U54DA021519-01A1 to the University of Michigan.