caArray: Juli Klemm (NCICB)


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

caArray: Juli Klemm (NCICB)

  1. 1. Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008
  2. 2. Agenda <ul><li>Brief overview of caArray 2.0 </li></ul><ul><li>caArray 2.0 and MAGE-TAB </li></ul><ul><li>MAGE-TAB feedback </li></ul>
  3. 3. What is caArray? <ul><li>caArray is a caBIG™-compliant microarray data repository at the NCICB </li></ul><ul><li>Developed to support a federated model of microarray data sharing </li></ul><ul><li>Developed in line with MIAME and MAGE guidelines </li></ul>caArray 1.6 caArray 2.0
  4. 4. Goals of caArray 2.0 <ul><li>Address Adopter feedback gained from our 1.x experience </li></ul><ul><ul><li>Improve the user experience for storing and retrieving data produced </li></ul></ul><ul><ul><li>Simplify and improve the performance of data access through the API and grid service, for analytical applications </li></ul></ul><ul><ul><li>Harmonize with caBIG™ tissue repository (caTissue) and annotation repository (caBIO) </li></ul></ul><ul><ul><li>Support additional array platforms, including SNP arrays </li></ul></ul><ul><ul><li>Organize the application around workflow between investigators and the labs that serve them </li></ul></ul><ul><li>Use an agile software development approach that will allow more frequent feature additions and better responsiveness to the user community </li></ul>
  5. 5. Features of caArray 2.0 <ul><li>Store array data associated with experiment and sample annotations </li></ul><ul><ul><li>Data entry through graphical user interface or MAGE-TAB </li></ul></ul><ul><li>Parse Affymetrix, Illumina and GenePix formats for expression and SNP arrays </li></ul><ul><li>Role-based permissions for data access </li></ul><ul><li>Programmatic access via a Java API and grid service </li></ul><ul><li>Manage protocols and controlled vocabularies </li></ul><ul><ul><li>MGED Ontoloty 1.3.1 comes pre-loaded </li></ul></ul><ul><li>Basic Browse and Search Functionality </li></ul>
  6. 6. caArray 2.0 Annotations <ul><li>Capture information for </li></ul><ul><ul><li>Experiment information </li></ul></ul><ul><ul><li>Contacts </li></ul></ul><ul><ul><li>Publications </li></ul></ul><ul><ul><li>Sample Annotations </li></ul></ul><ul><ul><ul><li>Source </li></ul></ul></ul><ul><ul><ul><li>Sample </li></ul></ul></ul><ul><ul><ul><li>Extract </li></ul></ul></ul><ul><ul><ul><li>Labeled Extracts </li></ul></ul></ul><ul><ul><ul><li>Hybridizations </li></ul></ul></ul>
  7. 7. caArray 2.0 supported formats <ul><li>Parsable file formats </li></ul><ul><li>Annotation </li></ul><ul><ul><li>MAGE-TAB .ADF, IDF, SDRF </li></ul></ul><ul><li>Array data - parsed </li></ul><ul><ul><li>Affymetrix Expression and SNP </li></ul></ul><ul><ul><ul><li>. CDF, .CEL, .CHP </li></ul></ul></ul><ul><ul><li>Illumina Expression and SNP </li></ul></ul><ul><ul><ul><li>.CSV </li></ul></ul></ul><ul><ul><li>GenePix </li></ul></ul><ul><ul><ul><li>.GAL, .GPR </li></ul></ul></ul><ul><li>Unparsed formats </li></ul><ul><ul><li>Affymetrix: .dat, .exp, .rpt, .txt </li></ul></ul><ul><ul><li>Illumina: .txt, .idat </li></ul></ul><ul><ul><li>Agilent: .txt, .tsv </li></ul></ul><ul><ul><li>ImaGene: .txt, .tiv </li></ul></ul><ul><ul><li>Nimblegen: .txt, .gff </li></ul></ul>
  8. 8. caArray 2.0 permissions <ul><li>Role-based permissions for each Installation </li></ul><ul><ul><li>Anonymous user </li></ul></ul><ul><ul><li>System Administration </li></ul></ul><ul><ul><li>Principle investigator/Biostatistician/Lab Administrator/Lab Scientist </li></ul></ul><ul><li>Data is Private until made Public </li></ul><ul><ul><li>Experiment title, PI, # samples are visible but experiment content is not available to the anonymous user </li></ul></ul><ul><li>Collaboration groups can be managed by the PI for pre-public collaboration </li></ul><ul><li>CSM 4.0 </li></ul><ul><ul><li>Experiment-level and samples-level security </li></ul></ul>
  9. 9. caArray 2.0 API and Grid Service <ul><li>Support for MAGE-TAB level of annotation – Simplified implementation of MAGE </li></ul><ul><li>API provides a data service and analytical services </li></ul><ul><ul><li>Data service allows users to use CQL to issue queries that traverse the domain model </li></ul></ul><ul><ul><li>Analytical services provide convenience methods for data access </li></ul></ul>
  10. 10. caArray 2.0 browse and search <ul><li>Browse by </li></ul><ul><ul><li>Experiments </li></ul></ul><ul><ul><li>Organism </li></ul></ul><ul><ul><li>Provider </li></ul></ul><ul><ul><li>Array design </li></ul></ul><ul><li>Search by specifying </li></ul><ul><ul><li>Keyword </li></ul></ul><ul><ul><li>Category </li></ul></ul>
  11. 11. MAGE-TAB in caArray 2.0 <ul><li>Support MAGE-TAB v1.0 – ADF, IDF, SDRF </li></ul><ul><li>Term Source providers and associated Terms are captured as Controlled Vocabularies (Manage Vocabularies) </li></ul><ul><li>Protocols imported and viewable in Manage Protocols </li></ul><ul><li>Characteristics displayed on the relevant detail pages </li></ul><ul><li>Original files are stored in association with the Experiment </li></ul><ul><ul><li>Edits made to the information in the UI are not reflected in these files </li></ul></ul><ul><ul><li>Future feature – MAGE-TAB export based on current database values </li></ul></ul>
  12. 12. MAGE-TAB for data migration <ul><li>caArray 1.6 >> caArray 2.0 </li></ul><ul><li>Experiments in caArray 1.6 being migrated to 2.0 are being exported in MAGE-TAB format along with the associated native array data files </li></ul><ul><li>Challenges included </li></ul><ul><ul><li>MAGE-OM >>MAGE-TAB mapping </li></ul></ul><ul><ul><li>Most challenges due to validation that all data “made it” over (not really a MAGE-TAB issue) </li></ul></ul><ul><ul><li>Manual checking still needed </li></ul></ul><ul><li>Jackson Labs internal MAD database >> caArray 2.0 </li></ul>
  13. 13. MAGE-TAB Feedback <ul><li>Initial experience with end-user-type customers is that there is a learning curve associated with using the SDRF, especially with regard to applying controlled vocabularies </li></ul><ul><ul><li>Need tools to facilitate this </li></ul></ul><ul><li>Source vs. Sample vs. Extract vs. Labeled Extract </li></ul><ul><ul><li>Often confusion over “what goes where” </li></ul></ul><ul><li>From Jackson Labs: </li></ul><ul><ul><li>Documentation is good for a biologist-type end-user, but software engineer would like more detail </li></ul></ul><ul><ul><li>More real-life examples would be helpful </li></ul></ul>
  14. 14. Specific requests to consider <ul><li>Need a way to specify required fields for particular implementations </li></ul><ul><ul><li>caArray UI has certain required fields – need to be able to specify these in a MAGE-TAB template </li></ul></ul><ul><li>Associate “Supplemental” files with an experiment </li></ul><ul><li>In IDF, recommend adding a field to specify the type of array experiment (Gene Expression, SNP, aCGH, etc.) </li></ul>