MAGE-TAB -  a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME complia...
What is needed to describe a microarray (and potentially any) experiment adequately (MIAME)? <ul><li>Description of biolog...
How to do this? <ul><li>Description of biological sample (research subject, aliquot – i.e., biomaterial) properties –  a l...
5. Experiment design graph Sample 1 Sample 2 Hybridisation Cy3 Cy5 Data
5. Experiment design graph Sample 1 (Homo S., Brain Sample 2 (Homo S., Kidney) Hybridisation (SMD array 1) Data Protocol (...
liver 1 (Homo sap.)  Hybridisation 1 (HG_U95A) Data1.cel Material  processing protocols (P-XMPL-1)  Normalisation protocol...
liver 1 (Homo sap.)  Hybridisation 1 (HG_U95A) Data1.cel Material  processing protocols (P-XMPL-1)  Normalisation protocol...
 
 
Important observation <ul><li>In high throughput experiments the experiment design graphs are </li></ul><ul><ul><li>Regula...
Layers in a more complex DAG A B C W F E G H I 0 1 2 3 4 5
 
In reality things are often even simpler
... or even simpler than that
v11 v12 v13 (c111, ..., c11n) (c121, ..., c12m) (c113, c ...,) l1 l2 l3 vk1 vk2 vk3 (ck11,..., c11n) (c121, ..., c12m) (c1...
Real world examples <ul><li>Simplified E-TABM-234.sdrf </li></ul>
 
 
Ontology usage <ul><li>Characteristics can be either free text or an ‘ontology entry’ </li></ul><ul><li>Ontology entry is ...
Elements to describe an experiment <ul><li>Description of biological sample (research subject, aliquot – i.e., biomaterial...
MAGE-TAB <ul><li>1., 5. Sample properties and experiment design - SDRF </li></ul><ul><li>2. Array design 2  - ADF (Array d...
Array Design (ADF)  <ul><li>ADF has been there around since MAGE-ML times as a way to describe an array </li></ul><ul><li>...
ADF
Investigation design file IDF <ul><li>Lists general information about the experiment and gives (a free text) description o...
 
Data files <ul><li>Raw data – native formats (cel, genpix) </li></ul><ul><li>Normalised, summarised data – columns may be ...
liver 1 (Homo sap.)  Hybridisation 1 (HG_U95A) Data1.cel Material  processing protocols (P-XMPL-1)  Normalisation protocol...
FGEM.txt: Gene n … . Gene 3 Gene 2 Gene 1 Hyb 3 (brain) Hyb 2 (kidney) Hyb 1 (liver) Genes
Experimental Factors <ul><li>Most important experimental variables (e.g., organs – liver, kidney, brain – in the  examples...
MAGE-TAB <ul><li>Investigation design file – IDF </li></ul><ul><li>Array design file ADF </li></ul><ul><li>Experiment (sam...
Standard design templates <ul><li>simple iterated design; </li></ul><ul><li>iterated design with technical replicates; </l...
MAGE-TAB <ul><li>Any experiment can be represented in MAGE-TAB in a structured way to MIAME granularity </li></ul><ul><li>...
Some points for later discussion <ul><li>What should be the level of prescription on ID column names (source, sample, extr...
Acknowledgements <ul><li>Tim Rayner, Helen Parkinson </li></ul><ul><li>Cathy Ball, Don Maier </li></ul><ul><li>Philippe Ro...
Upcoming SlideShare
Loading in...5
×

MAGE-TAB introduction: Alvis Brazma (EBI)

500

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
500
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MAGE-TAB introduction: Alvis Brazma (EBI)

  1. 1. MAGE-TAB - a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME compliant way MAGE-TAB workshop NCI, January 24, 2008
  2. 2. What is needed to describe a microarray (and potentially any) experiment adequately (MIAME)? <ul><li>Description of biological sample (research subject, aliquot – i.e., biomaterial) properties </li></ul><ul><li>Description of the assay (e.g., microarray design) </li></ul><ul><li>Data from the assays - raw and processed </li></ul><ul><li>Material and data processing protocols </li></ul><ul><li>Experiment design – which sample went to which assay and produced which data file </li></ul>
  3. 3. How to do this? <ul><li>Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet) </li></ul><ul><li>Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet) </li></ul><ul><li>Data from the assays - raw and processed – files or tables </li></ul><ul><li>Material and data processing protocols – free text </li></ul><ul><li>Experiment design – which sample went to which assay and produced which data file – a graph (normally a DAG) </li></ul>
  4. 4. 5. Experiment design graph Sample 1 Sample 2 Hybridisation Cy3 Cy5 Data
  5. 5. 5. Experiment design graph Sample 1 (Homo S., Brain Sample 2 (Homo S., Kidney) Hybridisation (SMD array 1) Data Protocol (Cy5) Protocol (Cy3) Protocol
  6. 6. liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel Material processing protocols (P-XMPL-1) Normalisation protocol (P-XMPL-2) liver 2 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel Kidney 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 1 (Homo sap.) FGDM.txt Kidney 2 (Homo sap.) Brain 2 (Homo sap.)
  7. 7. liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel Material processing protocols (P-XMPL-1) Normalisation protocol (P-XMPL-2) liver 2 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel Kidney 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 1 (Homo sap.) FGDM.txt Kidney 2 (Homo sap.) Brain 2 (Homo sap.)
  8. 10. Important observation <ul><li>In high throughput experiments the experiment design graphs are </li></ul><ul><ul><li>Regular (similar subgraphs repeated many times) </li></ul></ul><ul><ul><li>For most nodes there is only small number of incoming and outgoing edge </li></ul></ul><ul><ul><li>They can be presented in ‘layers’ in a natural way (in fact any DAG can be represented in layers) </li></ul></ul><ul><li>This makes their representation as a spreadsheet simple and natural </li></ul>
  9. 11. Layers in a more complex DAG A B C W F E G H I 0 1 2 3 4 5
  10. 13. In reality things are often even simpler
  11. 14. ... or even simpler than that
  12. 15. v11 v12 v13 (c111, ..., c11n) (c121, ..., c12m) (c113, c ...,) l1 l2 l3 vk1 vk2 vk3 (ck11,..., c11n) (c121, ..., c12m) (c113, ...,) l1 l2 l3 ... ... ...
  13. 16. Real world examples <ul><li>Simplified E-TABM-234.sdrf </li></ul>
  14. 19. Ontology usage <ul><li>Characteristics can be either free text or an ‘ontology entry’ </li></ul><ul><li>Ontology entry is identified by a ‘source’ column following it </li></ul>
  15. 20. Elements to describe an experiment <ul><li>Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet) </li></ul><ul><li>Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet) </li></ul><ul><li>Data from the assays - raw and processed – files or tables </li></ul><ul><li>Material and data processing protocols – free text </li></ul><ul><li>Experiment design – which sample went to which assay and produced which data file – a graph (normally a DAG) – also a spreadsheet! </li></ul>
  16. 21. MAGE-TAB <ul><li>1., 5. Sample properties and experiment design - SDRF </li></ul><ul><li>2. Array design 2 - ADF (Array design file) </li></ul><ul><li>4. Protocols – IDF (Investigation design file) </li></ul><ul><li>3. Data files and data matrices </li></ul>
  17. 22. Array Design (ADF) <ul><li>ADF has been there around since MAGE-ML times as a way to describe an array </li></ul><ul><li>A (table) spreadsheet - one row per array feature listing feature coordinates, the sequence, biological annotation, etc </li></ul>
  18. 23. ADF
  19. 24. Investigation design file IDF <ul><li>Lists general information about the experiment and gives (a free text) description of all the protocols </li></ul>
  20. 26. Data files <ul><li>Raw data – native formats (cel, genpix) </li></ul><ul><li>Normalised, summarised data – columns may be individual references </li></ul>
  21. 27. liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel Material processing protocols (P-XMPL-1) Normalisation protocol (P-XMPL-2) liver 2 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel Kidney 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 1 (Homo sap.) FGDM.txt Kidney 2 (Homo sap.) Brain 2 (Homo sap.)
  22. 28. FGEM.txt: Gene n … . Gene 3 Gene 2 Gene 1 Hyb 3 (brain) Hyb 2 (kidney) Hyb 1 (liver) Genes
  23. 29. Experimental Factors <ul><li>Most important experimental variables (e.g., organs – liver, kidney, brain – in the examples above) </li></ul><ul><li>Any column in the EDF can be marked as an experimental factor </li></ul><ul><li>These can be propagated down the edges of the graph to columns in FGEM </li></ul><ul><li>In FGEM they serve as concise annotation </li></ul>
  24. 30. MAGE-TAB <ul><li>Investigation design file – IDF </li></ul><ul><li>Array design file ADF </li></ul><ul><li>Experiment (sample) design file SDRF </li></ul><ul><li>Data files and data matrices (FGEM) </li></ul>
  25. 31. Standard design templates <ul><li>simple iterated design; </li></ul><ul><li>iterated design with technical replicates; </li></ul><ul><li>iterated design with pooling; </li></ul><ul><li>iterated designs for dual channel experiments; </li></ul><ul><li>dual channel iterated designs with dye swap; </li></ul><ul><li>dual channel iterated designs with a reference sample; </li></ul><ul><li>dual channel iterated design with a reference and dye swap; </li></ul><ul><li>dual channel iterated design with a pooled reference; </li></ul><ul><li>loop design; </li></ul><ul><li>loop design with die swap; </li></ul><ul><li>time series experiments; </li></ul><ul><li>http:// www.mged.org/Workgroups/MAGE/mage.html#mage -tab </li></ul>
  26. 32. MAGE-TAB <ul><li>Any experiment can be represented in MAGE-TAB in a structured way to MIAME granularity </li></ul><ul><li>Large experiments with a regular design can be represented in a natural way </li></ul><ul><li>It is possible to create MAGE-TAB files using generic spread-sheet software </li></ul><ul><li>It is flexible – the granularity of the experiment description can vary </li></ul>
  27. 33. Some points for later discussion <ul><li>What should be the level of prescription on ID column names (source, sample, extract, ..., summary data)? </li></ul><ul><ul><li>For instance, it is very often confusing to me what is source and what is sample </li></ul></ul><ul><li>Do we need the concept of Experimental Factors at all? </li></ul><ul><ul><li>An alternative could be optional labelling of variable characteristics as ‘intentional’ </li></ul></ul>
  28. 34. Acknowledgements <ul><li>Tim Rayner, Helen Parkinson </li></ul><ul><li>Cathy Ball, Don Maier </li></ul><ul><li>Philippe Rocca-Sera (ADF) </li></ul><ul><li>Michael Miller, Ugis Sarkans, Paul Spellman, Anna Farne </li></ul><ul><li>Mohammad Shojatalob – MAGE-TAB export from ArrayExpress </li></ul><ul><li>MAGE working group </li></ul><ul><li>Funding - NHGRI/NIBIB, MGED sponsors </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×