• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
GenePattern: Ted Liefeld

GenePattern: Ted Liefeld






Total Views
Views on SlideShare
Embed Views



2 Embeds 13

http://www.mged.org 9
http://www.slideshare.net 4



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    GenePattern: Ted Liefeld GenePattern: Ted Liefeld Presentation Transcript

    • GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007
    • a platform for integrative genomics Client User Interfaces Pipeline Environment Module Repository Module Integrator Desktop Programming Web all_aml_train all_aml_test Preprocess Class Neighbors Weighted Voting Cross-Val SOM Clustering Preprocess Weighted Voting Train/Test SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Golub and Slonim et. al 1999 KNN SVM SOM GSEA NMF PCA
    • Features
      • Automatic Module Integration
      • Add new modules without writing code
      • Supports any command line callable code (language independent)
      • Multiple user interfaces
      • Desktop client
      • Web client
      • Programmatic interfaces to Java, MATLAB, R
      • Local and Distributed Computing
      • Laptop
      • Client/Server
      • Compute farm
      • Public server (1/2008)
      • Interoperability
      • caBIG
        • caArray
        • caGrid
      • geWorkbench
      • Cytoscape
      • Analytic Reproducibility
      • Easy, rapid sharing of methodologies via pipelines
      • Versioning using Life Sciences Identifier (LSID)
      • Executable history of all sessions
      • Automatic pipeline generation from result files
      • Executable research documents
      • Comprehensive Module Repository
      • ~90 modules: analysis, visualization, pipelines
      • Expression, proteomic, sequence, variation (SNP), and whole genome association data
      • Construction of context-sensitive, flexible analytic workflows
      • Module suites
    • Gene Expression Analysis
      • Differential Marker Analysis
      • Gene Neighbors
      • caArray Retriever
      • GEO Download
      • Expression File Creator
      • Threshold
      • Variation Filter
      • MAGE-ML Import
      • MAGE-TAB Import…
    • SNP Analysis
      • Copy Number Estimation
      • Smoothing
      • LOH determination
      • Batch Correction
      • SNPViewer
      • SNPFileCreator
      • X Chromosome Correction
      • GISTIC pipeline (soon…)
    • Statistical Methods & Machine Learning Analyses Prediction K-Nearest Neighbors (KNN) Weighted Voting (WV) Support Vector Machines (SVM) Probabilistic Neural Networks (PNN) Classification and Regression Trees (CART) Clustering Hierarchical k-Means SOM Consensus Pathway Analysis GSEA ARACNE Cytoscape Other Statistical Methods Missing value imputation Kolmogorov-Smirnov score Non-negative Matrix Factorization (NMF) Principal Components Analysis (PCA)
    • Module Integrator
      • Add modules and visualizers without writing code
      • Share custom analysis tasks
      • Integrate your own or “third-party” tools easily
      • Add tools to a common repository
    • Pipelines for reproducible research all_aml_train all_aml_test Preprocess Class Neighbors Weighted Voting Cross-Val SOM Clustering Preprocess Weighted Voting Train/Test SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Golub and Slonim et. al 1999
      • Users can design workflows where the input to any module is the output of any previous module
      • Users can start with a result and automatically generate the workflow that created it
      • Input data, parameters, and code (optionally) are packaged with a pipeline
      • Every version of a module or pipeline is retained and uniquely identified
      • Pipelines and modules are exportable/importable and can be shared among GenePattern users
    • as a Visualization & Analysis Engine http://www.broad.mit.edu/mmgp Portal GenePattern GenePattern SNPViewer visualizer (running as applet) Run GenePattern Analyses LSF Worker Nodes
    • Using MAGE-ML today
    • MAGE-TAB use tomorrow
      • Ideally
        • Be able to automatically find raw/derived bioassay data when parsing MAGE-TAB files
          • Use MAGE-TAB like our native (tab-delimited) data formats, GCT, RES in (almost) any GenePattern analysis module
          • Not require user interaction to specify Assays or quantitation types
          • ? MGED-Ontology for common data transform protocols (eg RMA, MAS5) in addition to free text
      • Sub-optimal but still good
        • Have an interactive viewer to convert from MAGE-TAB to a native format (e.g. MAGE-ML import viewer)
          • Human interaction required…
    • More MAGE-TAB thoughts
      • Define structure/format for keeping multiple MAGE-TAB files together
        • IDF, ADF, SDRF, raw data files -> package together as ZIP? tgz?
          • Sub directories in the zip? (defined)
      • Does MAGE-TAB support for multiple Arrays in one file?
        • Useful & MAGE-ML allows this now (but I don’t like it for automated processing)
          • E.g. E-GEOD-995.mageml.tgz from ArrayExpress
    • More MAGE-TAB thoughts
      • Persistent identifiers
        • For protocols, samples etc
          • Allow use of SDRF, data matrix (eg in GP with persistent references to external entities)
            • Array details, experiment design, etc
      • Question?
        • Should we consider MAGE-TAB DAG to record data processing pipelines (provenance - HLA)?
          • e.g. a protocol for each module execution added to MAGE-TAB file outputs
            • File growth issues…
          • Record all analysis for a publication
          • Add additional SDRF file at each step
      • Collaborations
      • caBIG
      • MAGNet NCBC
      • NCIBI NCBC
      • Release Information
      • Initially released in March, 2004
      • Current version 3.0, released April 2007
        • 3.1 due Feb 08
      • Currently 5900+ users, 500+ organizations, ~90 countries
      • Availability
      • Freely available
      • Windows, Mac OS, and Unix platforms
      • Resources
      • http://www.genepattern.org
      • User workshops, documentation, email help desk, online user forum
      • Reich et al. (2006) Nature Genetics
      GenePattern is a winner of the 2005 BioIT World Best Practices Award